# IRP programming : an efficient way to reduce inter-module coupling

DOI: 10.13140/RG.2.1.3833.0406

## Contents

- 1 Abstract
- 2 Introduction
- 3 Part I : the basic principles of the IRP method
- 3.1 Some definitions
- 3.2 An example for Part I
- 3.3 Examples of implementations
- 3.4 Consequences : Low coupling beween modules

- 4 Part II : The consequences of designing a program as a tree
- 5 The concept of node-Path and automatisation of iterative processes in IRP method
- 6 The concepts of Symbols and Formulas : towards automatisation of entity design

# Abstract[edit]

(this text is a draft)

# Introduction[edit]

- see also IRP programming : a definition

Let suppose we already have a computer program able to calculate, some entities **S1**, .., **Sn** and that we want to add to it a new functionality : the computation of a *new* entity **A** by applying a known function **a_build** to **S1**, .., **Sn**.

The usual or **direct way** of programming the computation of **A** is the following :

- 1. get the values of the arguments
**S1**, ...,**Sn**. - 2. apply
**a_build**to them.

As you know, step 1 is rather difficult. It needs a deep knowledge of the code and is rarely straightforward. The **Si** calculation are usually embedded in part of code doing something else. Moreover each of them needs the values of other entities **T1**, ... **Tm** to be computed etc... .

*It is clear that everything would be much simpler if the x_build functions had no arguments*.

It is exactly what the IRP way of programming is doing :

It consists in encapsulating the gathering of the calculation of the arguments **Si** and the application of **a_build** on them in an intermediate A-specific and **argumentless** function **a_provide** which looks like :

function a_provide { S1 = s1_provide .... Sn = sn_provide a_build (S1, ... , Sn) }

As you wee, each **Si** is also calculated by calling its own specific and argumentless function **si_provide**.

This simple encapsulation trick, avoid the programmer to **worry about the calculations of the intermediate values Si, Ti ... Wi**. Those calculations are recursively delegated to their specific **provide** functions until some data is reached.

We can summarized very schematically the program flow by the tree below :

a_provide / \ s1_provide .... sn_provide / \ / \ t1_provide ... tm_provide w1_provide ... wk_provide / \ / \ data_1 ... data_... data_... data_s

As already pointed out, in programs written according to the **direct way** strong though unwanted coupling may occur between modules. Some authors even claim that this is unavoidable (see OOSC2 by Bertrand Meyer).

We show here that the **IRP way** avoids such spurious couplings. This method is called **Implicit Reference to Parameters**, because it relies on hiding the arguments (*parameters*) of the functions devoted to the building of the main entities of a program. As such, IRP does not depend on the programming language. Though it requires some additional design effort, it introduces constraints which enforce a more efficient design.

IRP can be applied to reshape step by step an already written code, keeping it running during the whole process. The resulting code is faster, easier to read and extremely easy to maintain and parallelize.^{[1]}

As a consequence, within the frame of IRP, code development can be done by several unconnected teams.

The first part of this paper will describe a very simplified version of the IRP method, based on a simple example. The second part will introduce an improved version. The third part will discuss the consequences of using IRP for functional and object-oriented languages. The fourth part will introduce a general tool for writing programs within the frame of IRP.

- see also Formal definition

# Part I : the basic principles of the IRP method[edit]

## Some definitions[edit]

### Entity of interest (EI)[edit]

In this paper an *entity* is any element of a program which represents a computable value. Among all entities, some play a major role because they are the skeleton of the program architecture. These entities are called **entities of interest** or **EI**s, in order to distinguish them from the vast amount of local and of low interest entities most programs deal with.

Within the frame of IRP, the *design* of a program consists precisely in defining its **EI**s. Therefore, they are the abstract representations of the main concepts of the considered (e.g. scientific) domain, namely, all those that are associated to computable values of interest. Only **EI**s are concerned with the **IRP** method.

For instance, in a classical molecular dynamics code, the **EI**s are the entities used to represent a molecule or the components of the force field while in a quantum chemistry code, the **EI**s are the entities used to represent the wavefunction, the operators, the integrals, etc....

The promotion of an entity to the status of **EI** is arbitrary. So, it may happen that an entity not taken into account at first by IRP, can later be included and vice-versa. We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.^{[remark 1]}

- Remark
**EI**s represent values of entities specific to some domain. Therefore they cannot be functions (which are used to build**EI**s values and not to*represent*them, this is done with data structures).

### Modules[edit]

A module is an element of the code where an **EI** is defined and its functionalities implemented. In Fortran90 and in OCaml it is a module, in O-O languages it is a class^{[remark 2]}

### Building function[edit]

Let us call **a_build**, the building function of a given **EI** named . When applied to the values of its arguments, **a_build** produces **a_value**, the value of .

#### Father and sons[edit]

When **b_value**, the value of an **EI** named ** Failed to parse (unknown error): B
**, is an argument of the building function

**a_build**, we will say that is the

*father*of

**and**

**Failed to parse (unknown error): B****a**

**Failed to parse (unknown error): B***son*of .

*Needs* relation[edit]

A father and any of its sons are connected by the relation *"necessary for the building of"* noted :

**Failed to parse (unknown error): (A~needs~B)**

or

**Failed to parse (unknown error): (A~\rightarrow~B)**

This relation is transitive:

**Failed to parse (unknown error): if~(A~\rightarrow~B)~and~(B~\rightarrow~C)~then~(A~\rightarrow~C)**

.

but asymmetric

**Failed to parse (unknown error): if~(A~\rightarrow~B)~then~\neg~(B~\rightarrow~A)**

Because of this relation the way a program written in IRP runs looks very similar to the way a *makefile* runs.

### Rules 1 and 2[edit]

In **IRP**, the building functions have specific properties :

- Rule 1 (unicity of a building function)
- for each there is one and only one building function (
**a_build**) and to each building function**a_build**corresponds a unique . In other words, there is a bijection between and**a_build**.

- Rule 2 (unicity of sons list)
- the arguments of a building function are the values of a well defined list of
**EI**: its sons. By well defined we mean that the function is not partial : all arguments must have values when the function is applied.

### Provider function[edit]

Let us consider the calculation of the value **a_value** of whichever **EI**, for example, .

**Rule 1** implies that there is only one way to calculate **a_value** : a call to its unique building function **a_build**.

While **Rule 2** implies that the values of the arguments of **a_build** (**b _{1}_value, b_{2}_value, ..., b_{n}_value**) can be calculated by calling the well defined list of

**n**functions

**b**, because

_{1}_build, b_{2}_build, ... ,b_{n}_build**Rule 1**applies also to

**, , ... ,**(

**b**).

_{i}_value = b_{i}_build ( ... )Therefore :

*the execution of***a_build***and the provision of the values of its arguments can be encapsulated in an*argumentless*function (***a_provide***), applying (pseudo-)recursively the same mechanism to the arguments entities.*

The argumentless function **a_provide** will look like :

a_provide b_{1}_value = b_{1}_provide b_{2}_value = b_{2}_provide ... b_{n}_value = b_{n}_provide result = a_build (b_{1}_value, b_{2}_value, ... , b_{n}_value) end

and the **b _{i}_provide** used to calculate the sons entities will look like :

b_{i}_provide c_{i1}_value = c_{i1}_provide c_{i2}_value = c_{i2}_provide ... c_{im}_value = c_{im}_provide result = b_{i}_build (c_{i1}_value, c_{i2}_value, ... , c_{im}_value) end

*Because a povider function is argumentless, this mechanism garantees that***a_value***does not depend on the context in which it is called in the program.*

There is no more to be worried about *how to provide the values of the arguments of a_build* : they will be recursively computed until their specific data are reached (see examples below). Therefore, the main difficulty of the

*standard way*

^{[2]}of programming is avoided.

It must be emphasized that for using a function **b _{i}_provide** nothing has to be known about the way a

**B**have been computed

_{i}*. The only requirement is that someone has previously implemented the code of*

**b**

_{i}_provide*somewhere in the program.*

**Rule 1** and **Rule 2** ensure that the code will work in any case or fail at link if one of the provide function has not been implemented (Note that in Caml the failure will even occur at compilation).

#### Provider Summary in Caml[edit]

let provide tag = let tag_son_list = formula tag in let value_son_list = List.map Son.provide tag_son_list in build value_son_list

## An example for Part I[edit]

There are many ways to implement the examples below depending on the language used (see Examples wikifrm:IRP_Programming_Example_OCaml_Triangle) These examples are written in a pseudo-language which does not exit, in the simplest way as possible.

### A program to print the surface of a triangle[edit]

We want to write a program which, given the 9 floating-point values defining the coordinates of the 3 vertices of a triangle, will calculate its surface and print it.

#### Design of the program[edit]

We proceed from top to bottom, designing each needed **EI** one after the other.

#### Design of a Triangle[edit]

The **EI** **Triangle** is designed as a data structure containing the couple *Point* (summit) and *Segment* (basis)

**Failed to parse (unknown error): {\rm Triangle} : \begin{cases} {\rm Point} \\ {\rm Segment} \end{cases} **

ATriangle: Point (summit)A/ \ / \ / \B------>CSegment (basis)

So, within the frame of this design, the **EI** **Triangle** has two sons, namely, a **Point** and **Segment**.

#### Design of a Segment[edit]

The **EI** **Segment** is designed as a data structure containing a couple of *Point*s (head and tail)

**Failed to parse (unknown error): {\rm Segment} : \begin{cases} {\rm Point} \\ {\rm Point} \end{cases} **

ASegment:A-------->BPoint Point (tail) (head)

The **EI** **Segment** has two sons, namely, a couple of **Point**s.

#### Design of a Point[edit]

The **EI** **Point** is designed as a data structure containing a triplet of **float**s (x, y and z)

**Failed to parse (unknown error): {\rm Point} : \begin{cases} {\rm Float} \\ {\rm Float} \\ {\rm Float} \end{cases} **

APoint: (Float,Float,Float)

So, within the frame of this design, the **EI** **Point** has three sons, namely, a triplet of **Float**s.

#### A possible implementation[edit]

##### The provider for a Point[edit]

- the building function will look like :

point_build (x, y, z) result = data_structure_of_x_of_y_and_z end

It returns a data structure (a triplet in this case) containing the values of x, y and z.

- the provider function will look like :

point_provide x = float_read y = float_read z = float_read result = point_build (x, y, z) end

- float_read is a function which reads the next float from the input stream. It is at the same time a float_provider and a float_build.
- Note that a
**float**entity is not an**EI**, it is a**data**(a concept that will be defined hereafter).

##### The provider for a Segment[edit]

- the building function will look like :

segment_build (a, b) result = data_structure_of_a_and_b end

It returns a data structure containing the values of Points **A** and **B**.

- the provider function will look like :

segment_provide tail = point_provide head = point_provide result = segment_build (tail, head) end

##### The provider for a Triangle[edit]

- the building function will look like :

triangle_build (summit, basis) result = data_structure_of_summit_and_basis end

It returns a data structure containing the values of Summit and of Basis.

##### Building a Triangle[edit]

- The only way to build a triangle anywhere in the program will be to apply the provider function triangle_provide, which looks like:

triangle_provide summit = point_provide basis = segment_provide result = triangle_build (summit, basis) end

- Note that the surface of a triangle, for instance, is not considered as an
**EI**in this design: it is treated as a simple property that can be computed as soon as the triangle is evaluated (as soon as the coordinates of the points are defined). Indeed, it can be calculated by the function*triangle_surface*which looks like :

triangle_surface (triangle) h = triangle_height_length (triangle) bc = triangle_basis_length (triangle) result = (h * bc) / 2. end

##### The main module[edit]

the main module is reduced to the calculation of the triangle **t** by a simple call to *triangle_provide* followed by the calculation of its surface **s**.

Then **s** is printed .

main t = triangle_provide s = triangle_surface (t) print "surface is ",t end

### The execution of the program[edit]

main triangle_provide point_provide float_read 0.0 float_read 0.0 float_read 0.0 segment_provide point_provide float_read 0.5 float_read 1.0 float_read 0.0 point_provide float_read 1.0 float_read 0.0 float_read 0.0 triangle_surface surface is 0.25

## Examples of implementations[edit]

### Fortran[edit]

- IRPF90 :

*a Fortran programming environment which helps the development of large Fortran codes by applying the IRP method.*

### Java[edit]

## Consequences : Low coupling beween modules[edit]

# Part II : The consequences of designing a program as a tree[edit]

## Some new definitions[edit]

### The production-tree[edit]

Being asymmetric and transitive the relation **needs** forms an acyclic tree. Its vertices are the **EI**s
and its edges the relation itself.

my_triangle:Triangle | ---------------------------------- / \ summit:Point basis:Segment | | -------------- --------------------- / | \ / \ x:Float y:Float z:Float tail:Point Head:Point f1 f2 f3 | | ----------- ----------- / | \ / | \ / | \ / | \ x:Float y:Float z:Float x:Float y:Float z:Float f4 f5 f6 f7 f8 f9 The tree of a Triangle with its 9 calculations conditions A vertical bar|or a/or a\meansneeds

### The target[edit]

For example the tree of Root (the tree of A for short) is drawn below :

A / \ B_{1}B_{2}/ | \ | C_{11}C_{12}C_{13}C_{21}

has no father it is also called the *target* of the program.

### The data[edit]

Each path of the tree climbs down until a leaf is reached.

A leaf is a **data** to be provided from the outside of the program. It is *not built* it is *read* : its building function is a *read*.

For example, suppose **C _{21}** were a data then we would have :

c_{21}_provide result = c_{21}_read end

### The nodes[edit]

Any **EI** which is not a *leaf* is a node.

### A path[edit]

We shall use a specific and restrictive definition for a path. In **IRP** the path of a node **N** is the list of *node descriptors* (names for short) encountered when going from the target of the program to the node **N** (included).

The path of ** Failed to parse (unknown error): C_{12}
** is (,

**B**,

_{1}**C**).

_{12}The list (**B _{1}**,

**C**) is a

_{12}**sub-path**of

**C**.

_{12}Here , **B _{1}**,

**C**are

_{12}*node descriptors*.

### A sub-tree[edit]

The sub-tree of a node **N** is the tree having the node **N** as target.

Each node has a unique path and also a unique sub-tree.

#### A value[edit]

The sub-tree ends with the **data**, which, when read, recursively define all the values of the nodes in the same sub-tree.

The values of the leaves of a node are called the **calculation-conditions** of the node.

Therefore:

**Property 1:**

*In***IRP**the value of a node is a pure function of its calculation-conditions

## Consequence I : Persistence[edit]

From the **property 1** above it is easy to make persistent any node **N** of the program. It is only necessary to built a data-base key from the calculation-conditions of **N**. These are collected recursively by joining up the calculation-conditions of the sons of **N**. A dry-run can do that prior to any other calculation. The key provider may look like :

a_key_provide b_{1}_key = b_{1}_key_provide b_{2}_key = b_{2}_key_provide result = a_key_build (b_{1}_key, b_{2}_key) end

Once the key is built (for example as a md5 key) the value of the node **N** can be stored in a data-base using that key. Therefore the provider can be modified in order not to calculate values already calculated elsewhere in the program (or even earlier) :

a_provide a_key = a_key_provide if is_stored (a_key) then result = retrieve (a_key) else b_{1}_value = b_{1}_provide b_{2}_value = b_{2}_provide a_value = a_build (b_{1}_value, b_{2}_value) store (a_key, a_value) result = a_value end

## Consequence II : Parallelism[edit]

#### Remarks on two possible bad designs[edit]

##### a triangle designed as 3 points[edit]

Designing a triangle as 3 points, would be too much unprecise : the summit would have been impossible to define *a priori* then and any of the 3 points would have played the same role which would have been very confusing. By naming each point we have avoided any source of confusion.

##### a triangle designed as a point and a segment[edit]

Designing a triangle from a point (the summit) and a segment, is not enough precise : the two points of the segment are equivalent and can be confused.

##### a triangle designed as 3 segments[edit]

The same critic as upper can be done here : each segment has to be defined as "basis" "left edge" and "right edge". Furthermore, designing a triangle as 3 segments (AB, BC, CA), would introduce a spurious coupling between the segments : the head a segment is necessarily the tail of an other and its tail the head of the third one.

This can be avoided by doing the following :

Triangle ABC AC | AB / | A BC / \ B C

1 read "B" and "C" to build BC 2 read "A" andextract "B" out off BC (to enforce the constraint that "B" is shared by AB and BC), build AB 3 extract "A" out off AB, extract BC out off AB then extract "C" out off BC, build AC.

##### a triangle designed as 3 segments[edit]

Designing a triangle as 3 segments would be extremely difficult because of the conditions on the triangle length which could not be defined in the design (compilation) but only at the execution.

## Simple examples[edit]

### An abstract example[edit]

t:T /\ / \ / \ u:U v:V / \ / \ d1 d2 u:U w:W / \ | d1 d2 d3

d1, d2, d3 are integer. t_build (u, v) result = u + v end

u_build (d1, d2) result = d1 + d2 end

v_build (u, w) result = u + w end

w_build (d3) result = d3 end

u_provide d1 = integer_read d2 = integer_read result = u_build (d1, d2) end

w_provide d3 = integer_read result = w_build (d3) end

v_provide u = u_provide w = w_provide result = v_build (u, w) end

t_provide u = u_provide v = v_provide result = t_build (u, v) end

main print t end

# The concept of node-Path and automatisation of iterative processes in IRP method[edit]

# The concepts of Symbols and Formulas : towards automatisation of entity design[edit]

## Symbols[edit]

A symbol names a class of entities having the same nature

### The production-tree as a symbolic descriptor[edit]

### The world of values as a dual of the symbolic world[edit]

## Formulas[edit]

A **formula** is used to translate the symbol of an **EI** into the symbols of its sons. It is a data_structure **F** made off son symbols.

By applying a mapping to a formula **F** we can obtain the same data structure **P** filled with the son paths and thus calculate the sons values with the **provide** function of each son.

Example : a molecule tag formula is a tree of fragment tags, a fragment tag a list of block tags.

In Caml a tag is not a data structure it is a "pure" descriptive type it "contains" nothing, while a formula is a data structure which contains the tags of the son entities.

## Conclusion[edit]

The basic idea under IRP is very simple *get a value wihout any reference to the parameters needed to calculate this value*. The consequences on the properties of the resulting code are not and it is possible that not all properties have yet been explored.

Anybody having used IRP is astonished by how easy is programming with it. It woud be even easier if people in charge of writting compilers could implement IRP in their language (i.e. the provide mecanism as part of the language).

An other application could be to write an OS on these principles : the package dependence would then be automatic and the growth of the system would have little effect on its maintainability.

## References[edit]

- ↑ mettre la référence de CHAMP
- ↑
**Cite error: Invalid**`<ref>`

tag; no text was provided for refs named`standard_way`

## Public notes and remarks[edit]

### Passages to be argumented[edit]

### Passages to be clarified[edit]

### Passages to be commented[edit]

### Passages to be discussed[edit]

### Passages to be emphasized[edit]

### Passages to be examplified[edit]

### Remarks[edit]

- ↑ We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.
- ↑ we shall see that the blending of type and data-structure in a class is confusing

### Specific Questions[edit]

### General questions on the whole text[edit]

- localization : ~/sources/irp_top/paper/Reduction_of_inter-module_coupling_using_IRP_programming_scheme.mw