# Audience

- Romain Aissat
- Thibaut Balabonski
- Frédéric Boulanger
- Marie-Claude Gaudel
- Yakoub Nemouchi
- Wolfgang Thumser
- Frédéric Tuong
- Freek Verbeek
- Frédéric Voisin
- Burkhart Wolff
- Lina Ye

# Project

Romain

## Introduction

White-box random-based testing of C programs is very inefficient if we do not detect unfeasible paths, as we will generate too many of them. In this project, Romain:

- implemented in Isabelle a program that, given a control-flow graph
(
*CFG*in short) of a C program, builds a new CFG in which some (hopefully most) unfeasible paths have been pruned; - proved the correctness of this program, that is to say that only unfeasible paths are ruled out.

## The algorithm

The idea of the algorithm is to compute symbolic executions of the source program: this algorithm gathers constraints on (symbolic) variables of the programs, and finally checks that those constraints are indeed satisfiable. If not, it means that this symbolic execution represents an unfeasible path.

Symbolic execution is performed over *configurations*. A
*configuration* is a pair of a symbolic state, assigning symbolic
variables to each variable of the program, and a conjunction of
constraints. Its semantics is the set of possible states that satisfy
the constraints.

The possible instructions are assumptions and assignments (or simply skipping).

Let's consider an example. If we start from the configuration

`([x = x₀, y = y₁], x₀ ≥ 0 ∧ y₁ = y₀ + 1)`

and the two instructions

assume (x ≥ 10) y := y + x

then the symbolic execution is:

([x = x₀, y = y₁], x₀ ≥ 0 ∧ y₁ = y₀ + 1) | | assume (x ≥ 10) V ([x = x₀, y = y₁], x₀ ≥ 0 ∧ y₁ = y₀ + 1 ∧ x₀ ≥ 10) | | y := y + x V ([x = x₀, y = y₂], x₀ ≥ 0 ∧ y₁ = y₀ + 1 ∧ x₀ ≥ 10 ∧ y₂ = y₀ + 1 + x₀)

To handle loops in the source programs, the algorithm search to establish subsumptions between configurations. A configuration is said to be subsumed by another if the states of the first one are states of the second one. As it is not always to possible to establish such a subsumption, configurations (which represent sets of possible program states) need to be enlarged, by relaxing the constraints. This is a tension with the fact that we are willing to remove as many unfeasible paths as possible.

An example of a CFG - simplified merging sort (sms)

The CFG produced by the algorithm for sms

## The Isabelle/HOL formalization

The Isabelle/HOL formalization has to handle two major problems:

- the algorithm might not terminate;
- we need to find the good invariants to finally establish correctness.

The formalization starts with defining arithmetic and Boolean expressions, that are used in configurations. This definition is done through a shallow embedding of these expressions into the Isabelle/HOL logic.

The most involved data-structure is the red-black labeled transition system
(*RB-LTS* in short), the data-structure on which the algorithm
successively operates. It contains the following five components:

- the whole LTS, which is a rooted graph with edges labeled by one of
the three instructions 'skip', 'assume' ans 'assign', referred to as
the
*black LTS*. It actually represents the CFG of the program under analysis; - a subset of it, representing the paths that have been visited so far,
referred to as the
*red LTS*; - the subsumption relation computed so far;
- the initial configuration;
- a function assigning a configuration to each red vertex.

The algorithm is then described as a succession of operations on this red-black LTS that can be of three kinds:

- transforming a black edge whose root is red into a red edge, if it represents a feasible path (using heuristics);
- adding a subsumption edge in the red LTS;
- an abstraction step, that takes a configuration in the red LTS and enlarges it. We restrict this operation to taking an edge at the border of the red LTS, in order to avoid propagating this abstraction right away (in this way, it will be automatically propagated when the algorithm goes on). Note that the program choosing one of the possible abstractions does not need to be proved correct for the whole program to be correct, and thus can be given by an oracle.

The final correctness theorem, formally established in Isabelle/HOL, states that this algorithm does not forget feasible paths: in the end, the configurations represented by the vertices of the black LTS which are not part of the red LTS must be unsatisfiable.