# IRP programming : an efficient way to reduce inter-module coupling

Jump to: navigation, search

DOI: 10.13140/RG.2.1.3833.0406

# Abstract

(this text is a draft)

# Introduction

Let suppose we already have a computer program able to calculate, some entities S1, .., Sn and that we want to add to it a new functionality : the computation of a new entity A by applying a known function a_build to S1, .., Sn.

The usual or direct way of programming the computation of A is the following :

1. get the values of the arguments S1, ..., Sn.
2. apply a_build to them.

As you know, step 1 is rather difficult. It needs a deep knowledge of the code and is rarely straightforward. The Si calculation are usually embedded in part of code doing something else. Moreover each of them needs the values of other entities T1, ... Tm to be computed etc... .

It is clear that everything would be much simpler if the x_build functions had no arguments.

It is exactly what the IRP way of programming is doing :

It consists in encapsulating the gathering of the calculation of the arguments Si and the application of a_build on them in an intermediate A-specific and argumentless function a_provide which looks like :

 function a_provide {

S1 = s1_provide
....
Sn = sn_provide

a_build (S1, ... , Sn)

}


As you wee, each Si is also calculated by calling its own specific and argumentless function si_provide.

This simple encapsulation trick, avoid the programmer to worry about the calculations of the intermediate values Si, Ti ... Wi. Those calculations are recursively delegated to their specific provide functions until some data is reached.

We can summarized very schematically the program flow by the tree below :

                                       a_provide
/                                         \
s1_provide                  ....               sn_provide
/           \                                  /           \
t1_provide ... tm_provide                      w1_provide ... wk_provide
/               \                              /               \
data_1     ...    data_...                      data_...           data_s


As already pointed out, in programs written according to the direct way strong though unwanted coupling may occur between modules. Some authors even claim that this is unavoidable (see OOSC2 by Bertrand Meyer).

We show here that the IRP way avoids such spurious couplings. This method is called Implicit Reference to Parameters, because it relies on hiding the arguments (parameters) of the functions devoted to the building of the main entities of a program. As such, IRP does not depend on the programming language. Though it requires some additional design effort, it introduces constraints which enforce a more efficient design.

IRP can be applied to reshape step by step an already written code, keeping it running during the whole process. The resulting code is faster, easier to read and extremely easy to maintain and parallelize.[1]

As a consequence, within the frame of IRP, code development can be done by several unconnected teams.

The first part of this paper will describe a very simplified version of the IRP method, based on a simple example. The second part will introduce an improved version. The third part will discuss the consequences of using IRP for functional and object-oriented languages. The fourth part will introduce a general tool for writing programs within the frame of IRP.

# Part I : the basic principles of the IRP method

## Some definitions

### Entity of interest (EI)

In this paper an entity is any element of a program which represents a computable value. Among all entities, some play a major role because they are the skeleton of the program architecture. These entities are called entities of interest or EIs, in order to distinguish them from the vast amount of local and of low interest entities most programs deal with.

Within the frame of IRP, the design of a program consists precisely in defining its EIs. Therefore, they are the abstract representations of the main concepts of the considered (e.g. scientific) domain, namely, all those that are associated to computable values of interest. Only EIs are concerned with the IRP method.

For instance, in a classical molecular dynamics code, the EIs are the entities used to represent a molecule or the components of the force field while in a quantum chemistry code, the EIs are the entities used to represent the wavefunction, the operators, the integrals, etc....

The promotion of an entity to the status of EI is arbitrary. So, it may happen that an entity not taken into account at first by IRP, can later be included and vice-versa. We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.[remark 1]

Remark
EIs represent values of entities specific to some domain. Therefore they cannot be functions (which are used to build EIs values and not to represent them, this is done with data structures).

### Modules

A module is an element of the code where an EI is defined and its functionalities implemented. In Fortran90 and in OCaml it is a module, in O-O languages it is a class[remark 2]

### Building function

Let us call a_build, the building function of a given EI named $A$. When applied to the values of its arguments, a_build produces a_value, the value of $A$.

#### Father and sons

When b_value, the value of an EI named Failed to parse (unknown error): B , is an argument of the building function a_build, we will say that $A$ is the father of Failed to parse (unknown error): B and Failed to parse (unknown error): B a son of $A$.

#### Needs relation

A father and any of its sons are connected by the relation "necessary for the building of" noted :

Failed to parse (unknown error): (A~needs~B)

or

Failed to parse (unknown error): (A~\rightarrow~B)

This relation is transitive:

Failed to parse (unknown error): if~(A~\rightarrow~B)~and~(B~\rightarrow~C)~then~(A~\rightarrow~C)

.

but asymmetric

Failed to parse (unknown error): if~(A~\rightarrow~B)~then~\neg~(B~\rightarrow~A)

Because of this relation the way a program written in IRP runs looks very similar to the way a makefile runs.

### Rules 1 and 2

In IRP, the building functions have specific properties :

Rule 1 (unicity of a building function)

for each $A$ there is one and only one building function (a_build) and to each building function a_build corresponds a unique $A$. In other words, there is a bijection between $A$ and a_build.

Rule 2 (unicity of sons list)

the arguments of a building function are the values of a well defined list of EI : its sons. By well defined we mean that the function is not partial : all arguments must have values when the function is applied.

### Provider function

Let us consider the calculation of the value a_value of whichever EI, for example, $A$.
Rule 1 implies that there is only one way to calculate a_value : a call to its unique building function a_build.
While Rule 2 implies that the values of the arguments of a_build (b1_value, b2_value, ..., bn_value) can be calculated by calling the well defined list of n functions b1_build, b2_build, ... ,bn_build, because Rule 1 applies also to $B_1$, $B_2$, ... ,$B_n$ (bi_value = bi_build ( ... ) ).

Therefore :

the execution of a_build and the provision of the values of its arguments can be encapsulated in an argumentless function ( a_provide ), applying (pseudo-)recursively the same mechanism to the arguments entities.

The argumentless function a_provide will look like :

    a_provide
b1_value = b1_provide
b2_value = b2_provide
...
bn_value = bn_provide
result = a_build (b1_value, b2_value, ... , bn_value)
end


and the bi_provide used to calculate the sons entities $B_i$ will look like :

     bi_provide
ci1_value = ci1_provide
ci2_value = ci2_provide
...
cim_value = cim_provide
result = bi_build (ci1_value, ci2_value, ... , cim_value)
end

Because a povider function is argumentless, this mechanism garantees that a_value does not depend on the context in which it is called in the program.

There is no more to be worried about how to provide the values of the arguments of a_build : they will be recursively computed until their specific data are reached (see examples below). Therefore, the main difficulty of the standard way[2] of programming is avoided.

It must be emphasized that for using a function bi_provide nothing has to be known about the way a Bi have been computed. The only requirement is that someone has previously implemented the code of bi_provide somewhere in the program.

Rule 1 and Rule 2 ensure that the code will work in any case or fail at link if one of the provide function has not been implemented (Note that in Caml the failure will even occur at compilation).

#### Provider Summary in Caml

 let provide tag =
let tag_son_list = formula tag in
let value_son_list =
List.map Son.provide tag_son_list
in
build value_son_list


## An example for Part I

There are many ways to implement the examples below depending on the language used (see Examples wikifrm:IRP_Programming_Example_OCaml_Triangle) These examples are written in a pseudo-language which does not exit, in the simplest way as possible.

### A program to print the surface of a triangle

We want to write a program which, given the 9 floating-point values defining the coordinates of the 3 vertices of a triangle, will calculate its surface and print it.

#### Design of the program

We proceed from top to bottom, designing each needed EI one after the other.

#### Design of a Triangle

The EI Triangle is designed as a data structure containing the couple Point (summit) and Segment (basis)

$\rm Triangle} : \begin{cases} {\rm Point} \\ {\rm Segment} \end{cases$

A Triangle :

Point (summit)

A
/ \
/   \
/     \
B------>C

Segment (basis)


So, within the frame of this design, the EI Triangle has two sons, namely, a Point and Segment.

#### Design of a Segment

The EI Segment is designed as a data structure containing a couple of Points (head and tail)

$\rm Segment} : \begin{cases} {\rm Point} \\ {\rm Point} \end{cases$

A Segment :

A-------->B

Point     Point
(tail)    (head)


The EI Segment has two sons, namely, a couple of Points.

#### Design of a Point

The EI Point is designed as a data structure containing a triplet of floats (x, y and z)

$\rm Point} : \begin{cases} {\rm Float} \\ {\rm Float} \\ {\rm Float} \end{cases$

A Point :

(Float, Float, Float)


So, within the frame of this design, the EI Point has three sons, namely, a triplet of Floats.

#### A possible implementation

##### The provider for a Point
• the building function will look like :
  point_build (x, y, z)
result = data_structure_of_x_of_y_and_z
end


It returns a data structure (a triplet in this case) containing the values of x, y and z.

• the provider function will look like :
  point_provide
x = float_read
y = float_read
z = float_read
result = point_build (x, y, z)
end

• float_read is a function which reads the next float from the input stream. It is at the same time a float_provider and a float_build.
• Note that a float entity is not an EI, it is a data (a concept that will be defined hereafter).
##### The provider for a Segment
• the building function will look like :
  segment_build (a, b)
result = data_structure_of_a_and_b
end


It returns a data structure containing the values of Points A and B.

• the provider function will look like :
  segment_provide
tail = point_provide
head = point_provide
result = segment_build (tail, head)
end

##### The provider for a Triangle
• the building function will look like :
  triangle_build (summit, basis)
result = data_structure_of_summit_and_basis
end


It returns a data structure containing the values of Summit and of Basis.

##### Building a Triangle
• The only way to build a triangle anywhere in the program will be to apply the provider function triangle_provide, which looks like:
  triangle_provide
summit = point_provide
basis = segment_provide
result = triangle_build (summit, basis)
end

• Note that the surface of a triangle, for instance, is not considered as an EI in this design: it is treated as a simple property that can be computed as soon as the triangle is evaluated (as soon as the coordinates of the points are defined). Indeed, it can be calculated by the function triangle_surface which looks like :
  triangle_surface (triangle)
h = triangle_height_length (triangle)
bc = triangle_basis_length (triangle)
result = (h * bc) / 2.
end

##### The main module

the main module is reduced to the calculation of the triangle t by a simple call to triangle_provide followed by the calculation of its surface s.

Then s is printed .

  main
t = triangle_provide
s = triangle_surface (t)

print "surface is ",t
end


### The execution of the program

main
triangle_provide
point_provide
float_read 0.0
float_read 0.0
float_read 0.0
segment_provide
point_provide
float_read 0.5
float_read 1.0
float_read 0.0
point_provide
float_read 1.0
float_read 0.0
float_read 0.0
triangle_surface
surface is 0.25


## Examples of implementations

### Fortran

a Fortran programming environment which helps the development of large Fortran codes by applying the IRP method.

# Part II : The consequences of designing a program as a tree

## Some new definitions

### The production-tree

Being asymmetric and transitive the relation needs forms an acyclic tree. Its vertices are the EIs and its edges the relation itself.

                                         my_triangle:Triangle
|
----------------------------------
/                                  \
summit:Point                          basis:Segment
|                                    |
--------------                   ---------------------
/      |       \                 /                     \
x:Float y:Float z:Float       tail:Point              Head:Point
f1      f2      f3              |                       |
-----------             -----------
/     |     \           /     |     \
/      |      \         /      |      \
x:Float y:Float z:Float x:Float y:Float z:Float
f4      f5      f6      f7      f8      f9

The tree of a Triangle with its 9 calculations conditions
A vertical bar | or a / or a \ means needs


### The target

For example the tree of Root $A$ (the tree of A for short) is drawn below :

                              A
/   \
B1     B2
/ | \    |
C11 C12 C13 C21


$A$ has no father it is also called the target of the program.

### The data

Each path of the tree climbs down until a leaf is reached.
A leaf is a data to be provided from the outside of the program. It is not built it is read : its building function is a read.

For example, suppose C21 were a data then we would have :

    c21_provide
result = c21_read
end


### The nodes

Any EI which is not a leaf is a node.

### A path

We shall use a specific and restrictive definition for a path. In IRP the path of a node N is the list of node descriptors (names for short) encountered when going from the target of the program to the node N (included).

The path of $12$ is ($A$, B1, C12).

The list (B1, C12) is a sub-path of C12.

Here $A$, B1, C12 are node descriptors.

### A sub-tree

The sub-tree of a node N is the tree having the node N as target.
Each node has a unique path and also a unique sub-tree.

#### A value

The sub-tree ends with the data, which, when read, recursively define all the values of the nodes in the same sub-tree.

The values of the leaves of a node are called the calculation-conditions of the node.

Therefore:

Property 1:
In IRP the value of a node is a pure function of its calculation-conditions

## Consequence I : Persistence

From the property 1 above it is easy to make persistent any node N of the program. It is only necessary to built a data-base key from the calculation-conditions of N. These are collected recursively by joining up the calculation-conditions of the sons of N. A dry-run can do that prior to any other calculation. The key provider may look like :

    a_key_provide
b1_key = b1_key_provide
b2_key = b2_key_provide
result = a_key_build (b1_key, b2_key)
end



Once the key is built (for example as a md5 key) the value of the node N can be stored in a data-base using that key. Therefore the provider can be modified in order not to calculate values already calculated elsewhere in the program (or even earlier) :

    a_provide
a_key = a_key_provide
if is_stored (a_key)
then result = retrieve (a_key)
else
b1_value = b1_provide
b2_value = b2_provide
a_value = a_build (b1_value, b2_value)
store (a_key, a_value)
result = a_value
end


## Consequence II : Parallelism

#### Remarks on two possible bad designs

##### a triangle designed as 3 points

Designing a triangle as 3 points, would be too much unprecise : the summit would have been impossible to define a priori then and any of the 3 points would have played the same role which would have been very confusing. By naming each point we have avoided any source of confusion.

##### a triangle designed as a point and a segment

Designing a triangle from a point (the summit) and a segment, is not enough precise : the two points of the segment are equivalent and can be confused.

##### a triangle designed as 3 segments

The same critic as upper can be done here : each segment has to be defined as "basis" "left edge" and "right edge". Furthermore, designing a triangle as 3 segments (AB, BC, CA), would introduce a spurious coupling between the segments : the head a segment is necessarily the tail of an other and its tail the head of the third one.

This can be avoided by doing the following :

                    Triangle  ABC

AC
|
AB
/  |
A   BC
/   \
B     C


1 read "B" and "C" to build BC
2 read "A" and extract "B" out off BC (to enforce the constraint that "B" is shared by AB and BC), build AB
3 extract "A" out off AB, extract BC out off AB then extract "C" out off BC, build AC.

##### a triangle designed as 3 segments

Designing a triangle as 3 segments would be extremely difficult because of the conditions on the triangle length which could not be defined in the design (compilation) but only at the execution.

## Simple examples

### An abstract example

                                  t:T
/\
/  \
/    \
u:U    v:V
/ \    / \
d1  d2 u:U w:W
/ \  |
d1 d2 d3

  d1, d2, d3 are integer.

t_build (u, v)
result = u + v
end

  u_build (d1, d2)
result = d1 + d2
end

  v_build (u, w)
result = u + w
end

  w_build (d3)
result = d3
end

  u_provide
d1 = integer_read
d2 = integer_read
result = u_build (d1, d2)
end

  w_provide
d3 = integer_read
result = w_build (d3)
end

  v_provide
u = u_provide
w = w_provide
result = v_build (u, w)
end

  t_provide
u = u_provide
v = v_provide
result = t_build (u, v)
end

  main
print t
end


# The concepts of Symbols and Formulas : towards automatisation of entity design

## Formulas

A formula is used to translate the symbol of an EI into the symbols of its sons. It is a data_structure F made off son symbols.

By applying a mapping to a formula F we can obtain the same data structure P filled with the son paths and thus calculate the sons values with the provide function of each son.

Example : a molecule tag formula is a tree of fragment tags, a fragment tag a list of block tags.

In Caml a tag is not a data structure it is a "pure" descriptive type it "contains" nothing, while a formula is a data structure which contains the tags of the son entities.

## Conclusion

The basic idea under IRP is very simple get a value wihout any reference to the parameters needed to calculate this value. The consequences on the properties of the resulting code are not and it is possible that not all properties have yet been explored.

Anybody having used IRP is astonished by how easy is programming with it. It woud be even easier if people in charge of writting compilers could implement IRP in their language (i.e. the provide mecanism as part of the language).

An other application could be to write an OS on these principles : the package dependence would then be automatic and the growth of the system would have little effect on its maintainability.

## References

1. mettre la référence de CHAMP
2. Cite error: Invalid <ref> tag; no text was provided for refs named standard_way

## Public notes and remarks

### Remarks

1. We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.
2. we shall see that the blending of type and data-structure in a class is confusing

### General questions on the whole text

• localization : ~/sources/irp_top/paper/Reduction_of_inter-module_coupling_using_IRP_programming_scheme.mw