32leaves.net

Making the world a safer place: self-verifying SMC

We all use code-generators, no matter if we love or hate them (or do both). But for most compilers you simply have to trust that they work correctly and do what they’re supposed to do. They typically do not provide any sort of proof for their correctness. Well, the SMC is different in that regard. It can generate verification code for its C backend.
The idea of self-verifying compilers it not completely new. Several methods have been proposed for this task, proof-carying code and credible compiler to name a few. What I’m doing here is some mixture of both (with a bit more of the credible compiler – altough the SMC is no optimizing compiler, so there is only a single transformation involved). Basically once we generated C code out of the FSM description, we have two (possibly different) FSMs. The one we described and the one SMC generated. The intuitive perception of those two FSMs being equal, is that their languages (dented by \mathcal{L}(F)) are equal. But how do we show that?
For our purpose it’s enough to show that the two transition functions are equal, as we only care about the generated FSM behaving exactly as the described one does. That still leaves us with the problem that we have to unobtrusively check the generated FSM.
Let us define p=(t_1, \dots, t_n), t_i = ((s, \sigma), (s', A_{in}, A_{out}))\in\delta to be a sequence of transitions (called path or trace). We’ll call a path loop free if \forall t: \neg\exists t': t = t' \vee s_t \neq s_{t'} holds – meaning that there are no two transitions within a path having the same start state.
Finally we accept the two FSMs to be equal iff for all loop-free paths in the original FSM, there exists an equivalent path in the generated machine. Following we accept the SMC to work correctly if the two FSMs (orignal one and generated one) are equal.
Generating the traces of the original FSM is a rather trivial task. We already have a good structure to work on (the SMC AST). That’s not the case with the generated FSM. First of all generating the trace must be unobtrusive. We must be sure that our instrumentation does alter the behavior of the FSM. We can accept that a statement of the form

1
printf("12345678");

generally does not alter the behavior, if placed appropriate. So we simply place print statements (with a unique ID) at each transition and action. Then we generate code that causes the FSM to transit along the previously determined paths.
If everything goes right, executing that code should cause a sequence of IDs to be printed. In case the SMC works correctly that sequence should be identical to those produced by traces of the original FSM.

To use this verification mechanism, run the SMC with the cverify backend and execute the generated _checker.sh script. That script either ends with “NO ERRORS FOUND” (which should always be the case), with a list of traces which the generated FSM did not perform. Additionally a checker.log file is generated which contains entries like this:

1
2
3
[ ] AST_Transition(_INIT,RT,List(AST_ConstEqCondition(CMD__MODE__RT)),AST_Actions(Some(List(clr)),None)) AST_Transition(RT,_ERR,List(AST_NegatedCondition(AST_PredicateCondition(command))),AST_Actions(None,None))
[!] A98601 T-93118478 T-1152986909 ;
[?] A98601 T-93118478 T-1152986909 ;

where the line marked by the exclamation mark denote the trace of the original FSM, as opposed to question mark which marks trace of the generated FSM.

Ahh, and there is yet another new feature, the SMC can now generate a NuSMV model. Of course that’s all in the repository.

State machine compiler

As mentioned in one my previous posts, next semester I’m going to participate in a project where we’re supposed to build a new interface to existing roboting hardware. To describe the protocol that is spoken between the computer and that hardware interface, I wanted to use some sort of DFA. I also wanted avoid having to implement that DFA in the different programming languages involved in the project. Especially since such implementations tend to diverge over time.
So I sat down and wrote something I call the state machine compiler. It allows the specification of a so called predicate deterministic finite automaton. Such a PDFA can be compiled to C or Java code – or to whichever language one writes a backend for. I won’t go into detail here, as I’ve written a manual which explains everything in detail.
You can grab the source code from subversion (username: guest, password: guest). As usual it’s published under CC-BY-SA. Be aware that this is considered work in progress, so the generated code may still be far from being optimal.

Scala and building DFA’s from regular expressions

These days I wanted to do something with that cool new shiny functional-object-fusion language for the JVM called Scala [1]. So I decided to write a program which builds a DFA [2] that accepts each word of the language defined by a regular epxression [3] (type 3 language – Chomsky hierarchy [4]). The algorithm for doing this is described in the paper written by J. A. Brzozowski [p1]. It all boils down to computing the so called “characteristic derivations” of a regular expression (side-note: I was pretty suprised to find that the validation of Relax-NG can be done in a similar way: using derivations [5]). Each of these derivations is then represented as a state in the resulting DFA (for a more formal description of the process read [p1] or [p2]). During the construction of the DFA, an equivalance test between regular expressions (called RE’s in future) is required. But this is not a trivial problem [p2].
So what I did was, I tried to simplify/minimalize each regular expression to it’s bare minimum which in turn allows me to compare the RE’s based on their structure. This is more to be seen as a heuristic, as the there is no garantee that the simplify mechanism boils down two equal RE’s to the same structure – but in practice it works pretty well.
Why did I use scala for this? Basically because that was the purpose – I wanted to do something with Scala. But it turned out that it really did get the job done. There are some really nice features features in that language:

just to name a few. Check the Scala language tour [11] for more features.

I’m not going to post a lot of code here, but I simply have to post the parser code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package net.t32leaves.regexpDFA;
import scala.util.parsing.combinator._

sealed abstract class Expr
case class SymbolExpr(chr: String) extends Expr
case class OrExpr(subsexpr: List[Expr]) extends Expr
case class IterationExpr(subexpr: Expr) extends Expr
case class GroupExpr(subsexpr: Expr) extends Expr
case class ConcatExpr(subexpr: List[Expr]) extends Expr
case object EmptyWordExpr extends Expr
case object EmptySetExpr extends Expr

object RegExpParser extends RegexParsers {

    def expr: Parser[Expr] = expr00
    def expr00: Parser[Expr] = expr01 ~( ("|" ~> expr01)*) ^^ {
      case l~e => if (e.size == 0) l else OrExpr(l :: e)
    }
 
    def expr01: Parser[Expr] = (atom*) ^^ {
      case a => if (a.size == 1) a(0) else ConcatExpr(a)
    }
   
    def atom: Parser[Expr] = ( bracketized | chr )
 
    def bracketized: Parser[Expr] = ( "(" ~ expr00 ~ ")" ~ (("*")?) ) ^^ {
      case _~expr~_~iter => iter match {
        case Some(x) => IterationExpr(GroupExpr(expr))
        case _ => GroupExpr(expr)
      }
    }
 
    def chr: Parser[Expr] = (symb ~ (("*")?)) ^^ {
      case expr~iter => iter match {
        case Some(x) => IterationExpr(SymbolExpr(expr))
        case _ => SymbolExpr(expr)
      }
    }
 
    def symb: Parser[String] = """[a-zA-Z0-9]"""r

    def parse(text : String) = parseAll(expr, text)

}

The idea behind this code is truly amazing: so called parser combinators [12][p3]. In the world of functional programming they’re nothing new (Haskell had them for a while already), but for someone coming from a rather object oriented world, this is something uber-cool. Once you got the hang of those parser combinators, you’ll recognise them as an ingenius idea. The code we’re writting here looks some kind of like BNF, but is indeed valid Scala code. Those

1
2
3
4
... ^^ {
    case ... => ...
    case ... => ...
}

statements do what’s called pattern matching [13] and are used to construct the AST [14].

That pattern matching mechanism can also be used to implement some logic pretty close to it’s formal definition. The paper “Derivatives of Regular Expressions,” [p1] defines a function

    \[\delta\]

which basically determines if the empty word can be produced by a regular expression. It can be implemented in Scala like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def containsEpsilon(expr: Expr): Boolean = {
    return expr match {
        case ConcatExpr(l) => (true /: l)(_ && containsEpsilon(_))
        case OrExpr(l) => (false /: l)(_ || containsEpsilon(_))
        case GroupExpr(l) => containsEpsilon(l)
        case SymbolExpr(_) => false
        case IterationExpr(_) => true
        case EmptyWordExpr => true
        case EmptySetExpr => false
    }
}

def delta(expr: Expr): Expr = {
    return if (containsEpsilon(expr)) EmptyWordExpr else EmptySetExpr
}

which is pretty close to the definition in the paper. The

containsEpsilon(expr: Expr): Boolean

uses two pretty neat concepts of Scala: pattern matching and the high-order function foldLeft [15] which is similar to the inject [16] in Ruby.

After all the computational work, a dot [17] file is generated and rendered to an image using Graphviz [18]. That’s why some of the graphs generated might not look as beautiful as they could.
Let’s have a look at the output of this program. Let our regular expression be

    \[aa\]

. Something really simple indeed. The DFA constructed by the program looks like this:

dfa5031

We can see from that simple example that the diagram needs some additional interpretation.

  1. If the machine reaches the state with a double circle, and there is no further inscription to be read, the machine has accepted the input (hence the outgoing transitions from the double circled state).
  2. The state which has an incoming transition from “nowhere” is the start state
  3. An uppercase E stands for

        \[\varepsilon\]

    (empty word) and an X for

        \[\phi\]

    (empty set)

If you want to give this a shot yourself, I’ve created a little service out of all that. It’s basically just a Jetty [19] wrapped around the program (in Scala of course). That service is hosted on a machine standing right under my desk in my apartment, so I can not garantee for it’s availability.

In order to try this, enter your expression in the box below. The parser accepts the following meta-chars:

*	Iteration
( )	Group
|	Alternative



Disable graph rendering (recommended for ”big” expressions)



In case you want the code, grab it using SVN https://guest:guest@hs.32leaves.net/svn/public/regexpToDFA. Please not the CC-BY-3.0 [20] license of the code.

References

  • [p2] S. Owens, J. Reppy, and A. Turon, "Regular-expression derivatives reexamined," Journal of Functional Programming, vol. 19, iss. 2, pp. 173-190, 2009. Go to document
    @article{p2,
      author = {Scott Owens and John Reppy and Aaron Turon},
      title = {Regular-expression derivatives reexamined},
      journal = {Journal of Functional Programming},
      publisher = {Cambridge University Press},
      address = {Cambridge, England},
      volume = 19, number = 2, year = 2009, pages = {173-190},
      topic = {implementation},
      url = {http://www.ccs.neu.edu/home/turon/re-deriv.pdf}
    }
  • [p3] P. Wadler, "How to replace failure by a list of successes," in Proc. of a conference on Functional programming languages and computer architecture, New York, NY, USA, 1985, pp. 113-128.
    @inproceedings{p3,
      author = {Wadler, Philip},
      title = {How to replace failure by a list of successes},
      booktitle = {Proc. of a conference on Functional programming languages and computer architecture},
      year = {1985},
      isbn = {3-387-15975-4},
      pages = {113--128},
      location = {Nancy, France},
      publisher = {Springer-Verlag New York, Inc.},
      address = {New York, NY, USA},
      }
  • [p1] J. A. Brzozowski, "Derivatives of Regular Expressions," J. ACM, vol. 11, iss. 4, pp. 481-494, 1964. Go to document
    @article{p1,
      author = {Brzozowski, Janusz A.},
      title = {Derivatives of Regular Expressions},
      journal = {J. ACM},
      volume = {11},
      number = {4},
      year = {1964},
      issn = {0004-5411},
      pages = {481--494},
      doi = {http://doi.acm.org/10.1145/321239.321249},
      publisher = {ACM},
      address = {New York, NY, USA},
      url = {http://portal.acm.org/citation.cfm?id=321239.321249&coll=GUIDE&dl=ACM&CFID=37630832&CFTOKEN=98811427}
    }

Fork me on GitHub