32leaves.net

Word generation using an n-dimensional discrete hypercube

This algorithm is something a friend of mine and I have developed quite some time ago (probably a year or so). It might also be already known under some name and I’m not aware of it. None the less it’s an interesting solution to a quite interesting problem.
Suppose that you wanted to generate the words w \in \Sigma^n of the alphabet \Sigma = \{\alpha_0, \alpha_1, \dots, \alpha_m\}. And that you wanted to do that in a parallel manner, such that you had a set of threads where each thread has a unique number out of 0\dots \vert\Sigma\vert^n assigned.
In that case it would be favorable to have a mapping in the form \underbrace{\{k: 0 \leq k < \vert\Sigma\vert^n\}}_S\to\Sigma^n, so that each thread could easily (reading with less computational effort) generate the word it’s supposed to work on. Notice that implicit definition of S:=\{k: 0 \leq k < \vert\Sigma\vert^n\}.
To solve the problem described previously, we start with a function that maps elements of S to points in an n-dimensional discrete hypercube. That mapping V: S\to\{ k: 0\leq k < \vert\Sigma\vert \}^n is defined as follows: V(s) = \begin{bmatrix} \left\lfloor\frac{s}{l^0}\right\rfloor \mod l \\ \left\lfloor\frac{s}{l^1}\right\rfloor \mod l \\ \vdots \\ \left\lfloor\frac{s}{l^{n-1}}\right\rfloor \mod l \end{bmatrix} where l:=\vert\Sigma\vert. This mapping obviously maps a natural number s < \vert\Sigma\vert^n to a point in a discrete n-dimensional hypercube with an edge length of \vert\Sigma\vert. Example: Consider \vert\Sigma\vert = 3 and n = 3. Then we can visualize V as follows:
Now that we have a mapping from a natural number to a point in the hypercube, we need a mapping from a point in the hypercube to the word itself. Such a mapping W: \{ k: 0\leq k < \vert\Sigma\vert \}^n \to \Sigma^n is easy to find. Consider W(p) := \alpha_{p_0}\alpha_{p_1}\dots\alpha_{p_n} where the p_i are the components of the vector p and \alpha_i \in \Sigma.
So our final mapping F: S\to\Sigma^n is F:=W\circ V and exactly what we desired. One might notice that F must be surjective so that each word is computed if there are just enough threads. To prove that, we’ll prove that V as well as W is surjective (since the functional composition of two surjective functions is surjective as well).
Theorem: V is surjective, so that \forall v \in \{ k: 0\leq k < \vert\Sigma\vert \}^n : \exists x \in S : V(x) = v.
Proof: Let l:=\vert\Sigma\vert, u_i\in\mathbb{N}, y\in S and v:=\begin{bmatrix} v_1 & \dots & v_n\end{bmatrix}^T. Assume that y = v_1\cdot l^0 + \dots + v_n\cdot l^{n-1}, then

V(y) = 	\begin{bmatrix}  		v_1+\underbrace{\dots+v_{n}\cdot l^{n-1}}_{u_1\cdot l} \mod l \\ 		\vdots \\ 		\underbrace{\left\lfloor\frac{v_1\cdot l^0 + \dots}{l^{n-1}}\right\rfloor}_{u_n\cdot l} + v_n \mod l 	\end{bmatrix} (1)
= \begin{bmatrix}  		v_1 \mod l & \dots & v_n \mod l 	\end{bmatrix}^T (2)
= \begin{bmatrix} v_1 & \dots & v_n\end{bmatrix}^T = v (3)

The transition from (1) to (2) is allowed as u_i\cdot l\mod l = 0. Similar goes for the equivalence of (2) and (3): v_i \mod l = v_i as v_i \leq l.

Theorem: W is surjective, so that \forall w \in \Sigma^n : \exists v \in \{ k: 0\leq k < \vert\Sigma\vert \}^n : W(v) = w.
Proof: Let w:=\alpha_{v_1}\dots\alpha_{v_n} then the vector v=\begin{bmatrix}v_1 & \dots & v_n\end{bmatrix} (which obviously fullfils W(v)=w) must exist, since v_i < \vert\Sigma\vert and v\in \{ k: 0\leq k < \vert\Sigma\vert \}^n.
So the mapping F is surjective and does exactly what we want. I consider that solution to be a particular elegant one and wanted of post it for about a year now. Finally I found some spare time to do so. If you use that solution, find a mistake or find it somewhere else (maybe even previously described) please drop a comment.

Raytracing: my first shot

After revisiting the Google talk about splines and fractals I had the idea to do this in three-dimensional space by using quaternions instead of complex numbers. It turned out that it wasn’t utterly difficult to render a simple three-dimensional Sierpinski gasket (or Menger-Sponge) using PovRay. At least installing PovRay on my Mac took more time than writting the script.
mengerSierpinski
While fuzzing around with PovRay a crazy idea came to my mind: why not write my own ray tracer. Of course that would only be the simplest possible as I only wanted to get a rough understanding on how this stuff works. After doing a bit of research it turned out that it shouldn’t be a too difficult task. Hey, this is one of the excercises in every “Computer graphics basics” lecture.
Three hours later I had a rough shot, but somehow my maths for calculating the rays was wrong. The day after, a friend of mine and myself went to a maths professor at my University and asked him if he could have a quick look at the formulas. His explanations were (as always) very insightful and simply brilliant. With the ideas he gave us we went on to get the missing bits done. And 4 more hours later we had a working ray tracer. (The sierpinski gasket was not rendered by my ray tracer, but I can’t figure out how to exclude images from the WordPress gallery).
Of course after 7 hours of work the ray tracer is far from being perfect. There are a few neat features missing (which might follow) such as shadows, reflection and radiosity (which would really be a big task). Also the maths concerning the projection plane are not totaly right. We should rather introduce a coordinate system transformation to get this done than hacking things together as it is right now. All in all it was a lot of fun to write that thing. If you want to have a look at the source code, check it out here (username: guest, password: guest).

MFCC based feature vectors of my music collection

After reading a scriptum about neural networks I wanted to get my hands on them. So I wrote some rather simple implementations of Hopfield networks [1], multi-layer-perceptrons [2] and last but not least (serial) Self-Organizing Maps [3] (SOM).
As the Hopfield nets and multi-layer perceptrons worked fine with some artificial test data (hand-crafted patterns for the Hopfield nets and a simple

    \[\sin\]

function for the MLP), I needed some test data for the SOMs. The idea was to cartograph my iTunes library. So I used categorical data stored in the library as input vectors for the SOM. It turned out that the data was not suited for this purpose, as it was to evenly distributed and not significant enough to identify meaningful clusters on the resulting U-Matrix [4].

After searching for a more usefull way of extracting feature vectors from my music collection I eventually decided to go with so called “Mel frequency cepstral coefficients” [5].
According to Wikipedia the mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another.(Mel scale [6]). Actually converting “normal” frequency into Mel frequency is pretty easy:

    \[f_{mel} = 2595 \cdot \log_{10}(\frac{f}{700})\]

.
I did not find a suitable implementation for calculating those MFCCs, so I wrote my own:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
/*
 * stdin: WAV, Mono-16bit-44100 mit Standard 48-byte WAV-Header
 * stdout: dito
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
#include <assert.h>
#include <fftw3.h>

/**
 * The range of the fiter bank in Hz
 */

#define F_MIN   20.0
#define F_MAX   11025.0

/**
 * Basically N is the window size in samples. As the WAV we're reading is supposed
 * to have a sample size of 44.1kHz and we want a window size of 0.1sec, so we set
 * this to 4410 (Hz).
 */

#define N       4410
#define N2_p1   2206
#define N2_m1   2204

#define SAMPLE_RATE         44100.0
#define SAMPLE_RATE_DIV_N   10.0

/**
 * The amount of filters in the MEL filter bank
 */

#define M_NUMCH 24
#define M_COEFF 20

double melScaleInverse(double f) {
    return 700 * (pow(10, f / 2595.0) - 1);
}

double melScale(double f) {
    return 2595 * log10(1 + (f / 700.0));
}

void buildFilterBank(double* filterBank) {
    int i = 0;
   
    double melDelta = (melScale(F_MAX) - melScale(F_MIN)) / (M_NUMCH + 1);
    for(i = 0; i < M_NUMCH; i++) {
        filterBank[i] = melScaleInverse(melDelta * i);
    }
}

double filter(double* f /*filterBank*/, int m /* filterIndex */, double f_k) {
   
    if(f_k < f[m - 1] || f_k >= f[m + 1]) {
        return 0;
    } else if(f[m - 1] <= f_k && f_k < f[m]) {
        return (f_k - f[m - 1]) / (f[m] - f[m - 1]);
    } else if(f[m] <= f_k && f_k < f[m + 1]) {
        return (f_k - f[m + 1]) / (f[m] - f[m + 1]);
    } else {
        return 0;
    }
}

void computeFFT() {
    int i, k, nread;
    uint16_t *iobuf;
    double *in;
    double ONE_DIV_N;
    double filterBank[M_NUMCH];
    fftw_complex *out;
    fftw_plan plan;
   

    ONE_DIV_N = 1.0 / N;
    iobuf = calloc(sizeof(uint16_t), N);
    in = (double*) fftw_malloc(sizeof(double) * N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    plan = fftw_plan_dft_r2c_1d(N, in, out,  FFTW_MEASURE);

    buildFilterBank(filterBank);

    /* pass thru the 48 byte wav-header */
    fread (iobuf, sizeof(uint16_t), 24, stdin);
   
    double c_ls[N2_m1];
    double c_mf[M_NUMCH];
    double c_mc[M_COEFF];
    while( (nread = fread(iobuf, sizeof(uint16_t), N, stdin)) == N ) {
        for(i = 0; i < N; i++) {
            in[i] = iobuf[i];
        }
       
        fftw_execute(plan);
       
        double result_re = 0;
        double result_im = 0;
       
        for(k = 0; k < N2_m1; k++) {
            c_ls[k] = sqrt(((out[k][0] / N) * (out[k][0] / N)) + (out[k][1] * out[k][1]));
        }
       
        for(i = 0; i < M_NUMCH; i++) {
            c_mf[i] = 0;
           
            for(k = 0; k < N2_m1; k++) {
                c_mf[i] += c_ls[k] * filter(filterBank, i, k * SAMPLE_RATE_DIV_N);
            }
           
        }
       
        for(k = 0; k < M_COEFF; k++) {
            c_mc[k] = 0;
           
            for(i = 0; i < M_NUMCH; i++) {
                if(c_mf[i] != 0) c_mc[k] += log(c_mf[i]) * cos(k * (M_PI / M_NUMCH) * (i - 0.5));
            }
        }
       
        fwrite(c_mc, sizeof(double), M_COEFF, stdout);
    }

    free(in);
    free(iobuf);
}


int main (int argc, char *argv[]) {
    computeFFT();
    return 0;
}
The implementation above reads raw WAV from the stdin and calculates the MFCCs from them. It’s called using

1
lame --decode -m m -t $filename | ./mfcc > $filename.features

and works as follows:

  1. Read the data using a rectangular 10ms window (usually a Hamming window [7] or similiar is used). This results in 4410 samples (16 bit wide) as the WAV is supposed to be sampled in 44.1kHz.
  2. Run FFT [8] on the acquired samples – here we use the FFTW [9] library. Then take the square of the transformation results: if

        \[Y_K\]

    is the

        \[K\]

    th transformation result we use

        \[\vert Y_K\vert^2\]

    .

  3. Scale the result of step 2 using the Mel scale filter bank. The one I used looks like this (this presentation [10] helped me alot for implementing the filter bank):
    filterbank We’ll call the Mel transformed value

        \[c^{mf}_i\]

    .

  4. Finally compute the

        \[n\]

    th coefficient using a discrete cosine transformation:

        \[c_n = \sum\limits_{i=0}^{N} \log_{10}(c^{mf}_i)\cdot\cos(\frac{n(2i-1)\pi}{2N})\]

    where

        \[N\]

    is the dimensionality of the resulting feature vector.

  5. Write the feature vector

        \[\vec{c} = (c_1, \dots, c_N)^T\]

    to the stdout as 8 byte wide little endian IEEE 754 double precision floats.

Running the programming above on my music collection (roughly 28 gigabytes) took roughly 24 hours and resulted in 487747560 feature vectors. That leaves me with to questions:
  1. How good is the quality of those feature vectors?
  2. How do I process such a vast amount of data?
In order to get an idea of the quality (in the context of feeding them into a SOM) I used some simple statistical methods. As the SOM estimates the propability density function of the samples, I had a closer look at the distribution of those vectors in the

    \[\mathbb{R}^20\]

.
The idea is to compute the mean vector using

    \[\vec{m} = \frac{1}{N}\sum\limits_{i=1}^N\vec{c}_n\]

, compute the eucledian distance (as this is the measure of distance a SOM implementation would use) of each vector

    \[\vec{c}_n\]

to

    \[\vec{m}\]

:

    \[d_i = \|\vec{m} - \vec{c}_i\|\]

. Finally we take the mean of the

    \[d_i\]

(

    \[d_m\]

) and calculate the

    \[\sigma_i = d_m - d_i\]

. Those

    \[\sigma_i\]

drawn plotted in a histogram look like this:histogram

The image above leaves us with two possible conclusions:
  1. The music I listen two is pretty much all the same
  2. The feature vectors are not really of good quality

Personally I’d go with the later one as my the music in my library is not that much the same. One way to improve the quality of those feature vectors could be use the first and second derivation of the cebstrum coefficients as proposed here [11] or to use a Hamming window for reading the data.

None the less, once I finally make it and finish a working batch SOM [12] (and this paper [13]) implementation I’ll do another post describing the results. Or I might simply abandon this project as the semester starts in a few days and I might run out of time.

References

Spectral graph drawing

Whenever I attend a lecture about some topic, I tend to gather some more information about that particular field than the lecture itself provides. This time it’s about graph theory. While I was reading about graph drawing algorithms, I stumbled upon something called spectral graph theory [2]. In order to get the idea of what this is all about, I started looking thru the presentation on spectral graph theory [2] and reading a few [15] papers [8] on this topic.
Note: Please forgive me my formal imprecision. If anyone feels “offended” by this, she might want to read the papers instead.
The basic idea behind all this theory is to compute the eigenvectors [5] of the laplacian matrix [6] of a graph. Then sort the eigenvalues in ascending order (thus the eigenvectors are sorted, too). Then if

    \[v_0,\dots,v_n\]

are the eigenvectors and

    \[\lambda_0,\dots,\lambda_n\]

the according eigenvalues of the laplacian matrix of the graph

    \[G = (\{k_0,\dots,k_n\}, E)\]

sorted that

    \[\lambda_m \leq \lambda_{m+1}\]

,

    \[0<m<n\]

, the position vector for vertex

    \[k_i\]

is

    \[\begin{bmatrix}v_1(i) \\ v_2(i)\end{bmatrix}\]

where

    \[v_k(i)\]

denotes the

    \[i\]

th component of the eigenvector

    \[v_k\]

.

Due to reasons explained in the papers [15] [8] this method of graph drawing tends to draw the graph planarily [9]. Thus large graphs (especially mesh graphs) tend to look pretty good if drawn like this.
Heaving understood the theory (at least most of it), I wanted to try things out. So I got some test data [10], wrote a small ruby script which converted the PARTY format [11] (or at least a subset of it) into a text-based matrix format, so that Octave [12] was able to read the ajency matrices [13] of the graphs. Let

    \[D\]

denote the degree matrix [14] of the graph and

    \[A\]

the adjency matrix, then the laplacian matrix is

    \[L=D-A\]

. Now we may calculate the eigenvectors of

    \[D\]

. The Octave code below loads some graph’s adjency matrix in text format and calculates the (first three) eigenvectors of the laplacian matrix (basically taken from [15]):

1
2
3
4
load graph.txt;
A = sparse(graph);
L = diag(sum(A)) - A;
[v,d] = eigs(L, 3, 'sm');

Then I plotted the graph using gnuplot (to be more precise: using gplot, Octaves built-in graph plotting functionality). The result was already pretty pleasing:

crack_graph

But I wanted things to look better, so I wrote a MEX file [16] which plotts graphs using OpenCV [17].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include "mex.h"
#include <opencv/cv.h>
#include <opencv/highgui.h>

#define SCALE  40000
#define IN_ADJ prhs[0]
#define IN_XY  prhs[1]

#define min(x,y) ((x) < (y) ? (x) : (y))
#define max(x,y) ((x) > (y) ? (x) : (y))
#define XY(i)    (int)((xyOffset + xy[i]) * SCALE)

void mexFunction (int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
   
    if(nrhs < 2) mexErrMsgTxt("expects two parameters: adjency matrix (sparse, n times n), xy vector (n times 2)");
    if(! mxIsSparse(IN_ADJ)) mexErrMsgTxt("Adjency matrix must be sparse");
    if(mxGetM(IN_ADJ) != mxGetN(IN_ADJ)) mexErrMsgTxt("Adjency matrix must be square");
    int n = mxGetN(IN_ADJ);
    int nz = mxGetNzmax(IN_ADJ);
   
    if(mxGetN(IN_XY) != 2 || mxGetM(IN_XY) != n) mexErrMsgTxt("XY vector must be (n times 2)");

    double *xy = mxGetPr(IN_XY);
    double *adj = mxGetPr(IN_ADJ);
   
    // get dimensions
    int i, j;
    double xyOffset = 0, maxX = 0, maxY = 0;
    for(i = 0; i < n; i++) {
        if(xy[i] < xyOffset) xyOffset = xy[i];
        if(xy[i] > maxX) maxX = xy[i];
    }
    for(i = n; i < 2*n; i++) {
        if(xy[i] < xyOffset) xyOffset = xy[i];
        if(xy[i] > maxY) maxY = xy[i];
    }
    xyOffset = -xyOffset;
    maxX += xyOffset;
    maxY += xyOffset;
   
    mexPrintf("img size: %i x %i, xyOffset: %f, maxX: %f, maxY: %f\n", (int)(maxX * SCALE), (int)(maxY * SCALE), xyOffset, maxX, maxY);
    IplImage *img = cvCreateImage(cvSize((int)(maxX * SCALE), (int)(maxY * SCALE)), IPL_DEPTH_8U, 3);

    int *ir = mxGetIr(IN_ADJ);
    int *jc = mxGetJc(IN_ADJ);
   
    int total = 0;
    CvPoint p1, p2;
   
    for (i=0; i<n; i++)  {
        p1.x = XY(i);
        p1.y = XY(n + i);
       
        int starting_row_index = jc[i];
        int stopping_row_index = jc[i+1];
        if (starting_row_index == stopping_row_index) continue;
        else {
            for (j = starting_row_index; j < stopping_row_index; j++)  {
                p2.x = XY(ir[j]);
                p2.y = XY(n + ir[j]);
               
                cvLine(img, p1, p2, cvScalar(255,128,255,0), 1, CV_AA, 0);
               
            //  mexPrintf("\t(%d,%d) = %g || (%i %i) -- (%i %i)\n", ir[j]+1, i+1, adj[total++], p1.x, p1.y, p2.x, p2.y);
            }
        }
    }
   
    if(!cvSaveImage("/tmp/gr.png", img)) mexErrMsgTxt("Unable to write /tmp/gr.png");
    cvNamedWindow("mainWin", CV_WINDOW_AUTOSIZE);
    cvShowImage("mainWin", img );
    cvWaitKey(20);
   
}

That code was used to produce the following graphs (except for the blue one, my WordPress simply can’t exclude images in a gallery):

With something looking that cool in my hands, I simply had to create posters out of that. So I postprocessed two graph images and ordered prints (the two black and white images). That stuff is fun, and extremly interessting. I’ll for sure do some further investigation on this topic.

References

Scala and building DFA’s from regular expressions

These days I wanted to do something with that cool new shiny functional-object-fusion language for the JVM called Scala [1]. So I decided to write a program which builds a DFA [2] that accepts each word of the language defined by a regular epxression [3] (type 3 language – Chomsky hierarchy [4]). The algorithm for doing this is described in the paper written by J. A. Brzozowski [p1]. It all boils down to computing the so called “characteristic derivations” of a regular expression (side-note: I was pretty suprised to find that the validation of Relax-NG can be done in a similar way: using derivations [5]). Each of these derivations is then represented as a state in the resulting DFA (for a more formal description of the process read [p1] or [p2]). During the construction of the DFA, an equivalance test between regular expressions (called RE’s in future) is required. But this is not a trivial problem [p2].
So what I did was, I tried to simplify/minimalize each regular expression to it’s bare minimum which in turn allows me to compare the RE’s based on their structure. This is more to be seen as a heuristic, as the there is no garantee that the simplify mechanism boils down two equal RE’s to the same structure – but in practice it works pretty well.
Why did I use scala for this? Basically because that was the purpose – I wanted to do something with Scala. But it turned out that it really did get the job done. There are some really nice features features in that language:

just to name a few. Check the Scala language tour [11] for more features.

I’m not going to post a lot of code here, but I simply have to post the parser code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package net.t32leaves.regexpDFA;
import scala.util.parsing.combinator._

sealed abstract class Expr
case class SymbolExpr(chr: String) extends Expr
case class OrExpr(subsexpr: List[Expr]) extends Expr
case class IterationExpr(subexpr: Expr) extends Expr
case class GroupExpr(subsexpr: Expr) extends Expr
case class ConcatExpr(subexpr: List[Expr]) extends Expr
case object EmptyWordExpr extends Expr
case object EmptySetExpr extends Expr

object RegExpParser extends RegexParsers {

    def expr: Parser[Expr] = expr00
    def expr00: Parser[Expr] = expr01 ~( ("|" ~> expr01)*) ^^ {
      case l~e => if (e.size == 0) l else OrExpr(l :: e)
    }
 
    def expr01: Parser[Expr] = (atom*) ^^ {
      case a => if (a.size == 1) a(0) else ConcatExpr(a)
    }
   
    def atom: Parser[Expr] = ( bracketized | chr )
 
    def bracketized: Parser[Expr] = ( "(" ~ expr00 ~ ")" ~ (("*")?) ) ^^ {
      case _~expr~_~iter => iter match {
        case Some(x) => IterationExpr(GroupExpr(expr))
        case _ => GroupExpr(expr)
      }
    }
 
    def chr: Parser[Expr] = (symb ~ (("*")?)) ^^ {
      case expr~iter => iter match {
        case Some(x) => IterationExpr(SymbolExpr(expr))
        case _ => SymbolExpr(expr)
      }
    }
 
    def symb: Parser[String] = """[a-zA-Z0-9]"""r

    def parse(text : String) = parseAll(expr, text)

}

The idea behind this code is truly amazing: so called parser combinators [12][p3]. In the world of functional programming they’re nothing new (Haskell had them for a while already), but for someone coming from a rather object oriented world, this is something uber-cool. Once you got the hang of those parser combinators, you’ll recognise them as an ingenius idea. The code we’re writting here looks some kind of like BNF, but is indeed valid Scala code. Those

1
2
3
4
... ^^ {
    case ... => ...
    case ... => ...
}

statements do what’s called pattern matching [13] and are used to construct the AST [14].

That pattern matching mechanism can also be used to implement some logic pretty close to it’s formal definition. The paper “Derivatives of Regular Expressions,” [p1] defines a function

    \[\delta\]

which basically determines if the empty word can be produced by a regular expression. It can be implemented in Scala like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def containsEpsilon(expr: Expr): Boolean = {
    return expr match {
        case ConcatExpr(l) => (true /: l)(_ && containsEpsilon(_))
        case OrExpr(l) => (false /: l)(_ || containsEpsilon(_))
        case GroupExpr(l) => containsEpsilon(l)
        case SymbolExpr(_) => false
        case IterationExpr(_) => true
        case EmptyWordExpr => true
        case EmptySetExpr => false
    }
}

def delta(expr: Expr): Expr = {
    return if (containsEpsilon(expr)) EmptyWordExpr else EmptySetExpr
}

which is pretty close to the definition in the paper. The

containsEpsilon(expr: Expr): Boolean

uses two pretty neat concepts of Scala: pattern matching and the high-order function foldLeft [15] which is similar to the inject [16] in Ruby.

After all the computational work, a dot [17] file is generated and rendered to an image using Graphviz [18]. That’s why some of the graphs generated might not look as beautiful as they could.
Let’s have a look at the output of this program. Let our regular expression be

    \[aa\]

. Something really simple indeed. The DFA constructed by the program looks like this:

dfa5031

We can see from that simple example that the diagram needs some additional interpretation.

  1. If the machine reaches the state with a double circle, and there is no further inscription to be read, the machine has accepted the input (hence the outgoing transitions from the double circled state).
  2. The state which has an incoming transition from “nowhere” is the start state
  3. An uppercase E stands for

        \[\varepsilon\]

    (empty word) and an X for

        \[\phi\]

    (empty set)

If you want to give this a shot yourself, I’ve created a little service out of all that. It’s basically just a Jetty [19] wrapped around the program (in Scala of course). That service is hosted on a machine standing right under my desk in my apartment, so I can not garantee for it’s availability.

In order to try this, enter your expression in the box below. The parser accepts the following meta-chars:

*	Iteration
( )	Group
|	Alternative



Disable graph rendering (recommended for ”big” expressions)



In case you want the code, grab it using SVN https://guest:guest@hs.32leaves.net/svn/public/regexpToDFA. Please not the CC-BY-3.0 [20] license of the code.

References

  • [p2] S. Owens, J. Reppy, and A. Turon, "Regular-expression derivatives reexamined," Journal of Functional Programming, vol. 19, iss. 2, pp. 173-190, 2009. Go to document
    @article{p2,
      author = {Scott Owens and John Reppy and Aaron Turon},
      title = {Regular-expression derivatives reexamined},
      journal = {Journal of Functional Programming},
      publisher = {Cambridge University Press},
      address = {Cambridge, England},
      volume = 19, number = 2, year = 2009, pages = {173-190},
      topic = {implementation},
      url = {http://www.ccs.neu.edu/home/turon/re-deriv.pdf}
    }
  • [p3] P. Wadler, "How to replace failure by a list of successes," in Proc. of a conference on Functional programming languages and computer architecture, New York, NY, USA, 1985, pp. 113-128.
    @inproceedings{p3,
      author = {Wadler, Philip},
      title = {How to replace failure by a list of successes},
      booktitle = {Proc. of a conference on Functional programming languages and computer architecture},
      year = {1985},
      isbn = {3-387-15975-4},
      pages = {113--128},
      location = {Nancy, France},
      publisher = {Springer-Verlag New York, Inc.},
      address = {New York, NY, USA},
      }
  • [p1] J. A. Brzozowski, "Derivatives of Regular Expressions," J. ACM, vol. 11, iss. 4, pp. 481-494, 1964. Go to document
    @article{p1,
      author = {Brzozowski, Janusz A.},
      title = {Derivatives of Regular Expressions},
      journal = {J. ACM},
      volume = {11},
      number = {4},
      year = {1964},
      issn = {0004-5411},
      pages = {481--494},
      doi = {http://doi.acm.org/10.1145/321239.321249},
      publisher = {ACM},
      address = {New York, NY, USA},
      url = {http://portal.acm.org/citation.cfm?id=321239.321249&coll=GUIDE&dl=ACM&CFID=37630832&CFTOKEN=98811427}
    }

Hello CUDA

As I wrote in my last post, we started fiddling around with CUDA today. First thing to notice: the demos are just totally awesome (like this particle demo here). We’re talking teraflops of computing power here. After doing the basic CUDA excercises (which were really helpful) I started to implement a CA again. And guess which one: A2 of course.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
// includes, system
#include <stdio.h>
#include <assert.h>

// Simple utility function to check for CUDA runtime errors
void checkCUDAError(const char *msg);

__device__ void get_field(int* d_res, dim3 d_pos, int *d_p0, int d_size) {
    (*d_res) = 0;
    if(d_pos.x < d_size && d_pos.y < d_size) (*d_res) = d_p0[(d_pos.y * d_size) + d_pos.x];
}

__global__ void caKernel( int *d_p0, int size, int *d_p1 )
{
    int x = (blockIdx.x * gridDim.x) + threadIdx.x;
    int y = (blockIdx.y * gridDim.y) + threadIdx.y;

    int sum = 0;
    int v = 0;
    get_field(&v, dim3(x - 1, y - 1), d_p0, size); sum += v;
    get_field(&v, dim3(x - 0, y - 1), d_p0, size); sum += v;
    get_field(&v, dim3(x + 1, y - 1), d_p0, size); sum += v;
    get_field(&v, dim3(x - 1, y - 0), d_p0, size); sum += v;
    get_field(&v, dim3(x + 1, y - 0), d_p0, size); sum += v;
    get_field(&v, dim3(x - 1, y + 1), d_p0, size); sum += v;
    get_field(&v, dim3(x - 0, y + 1), d_p0, size); sum += v;
    get_field(&v, dim3(x + 1, y + 1), d_p0, size); sum += v;
   
    d_p1[y * size + x] = sum == 2 ? 1 : 0;
}

////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int main( int argc, char** argv)
{
    if(argc < 3) {
    fprintf(stderr, "usage: %s <size> <gens>\n", argv[0]);
    return -1;
    }
    int size = atoi(argv[1]);
    int gens = atoi(argv[2]);

    printf("Calculating %i generations for a %ix%i (=%i cells) poulation\n", gens, size*size, size*size, size*size*size*size);

    // pointer for host memory
    int *h_a;

    // pointer for device memory
    int *d_a;
    int *d_b;

    size_t memSize = size * size * size * size * sizeof(int);
    h_a = (int *) malloc(memSize);
    cudaMalloc( (void**)&d_a, memSize  );
    cudaMalloc( (void**)&d_b, memSize  );

    memset(h_a, 0, memSize);
    int i; for(i = 0; i < size*size; i++) h_a[i] = i%2;
    cudaMemcpy( d_a, h_a, memSize, cudaMemcpyHostToDevice );

    dim3 dimGrid( size, size );
    dim3 dimBlock( size, size );
    for(i = 0; i < gens; i++) {
    caKernel<<< dimGrid , dimBlock >>>( i % 2== 0 ? d_a : d_b, size * size, i % 2==0 ? d_b : d_a );
    }

    // block until the device has completed
    cudaThreadSynchronize();

    // check if kernel execution generated an error
    checkCUDAError("kernel execution");

    cudaMemcpy( h_a, d_b, memSize, cudaMemcpyDeviceToHost );

    // Check for any CUDA errors
    checkCUDAError("cudaMemcpy");

    // free device memory
    cudaFree(d_a);
    cudaFree(d_b);

    // free host memory
    free(h_a);

    return 0;
}

void checkCUDAError(const char *msg)
{
    cudaError_t err = cudaGetLastError();
    if( cudaSuccess != err)
    {
        fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) );
        exit(-1);
    }                        
}

The implementation is roughly based on those execercises. What makes CUDA that powerful is that it employs massive parallel computing. The time complexity of the algorithm itself is

    \[\mathcal{O}(n)\]

(where

    \[n = k^2\]

,

    \[k\]

being the population size – the X-axis in the graph below) for computing one generation. But as we can see the factor by which the time grows is pretty low.

Note: This is all unproven. A formal prove may follow some time (especially for that algorithm being

    \[\mathcal{O}(n)\]

)

Time behaviour

It’s also pretty interesting to compare those results with the time required to do all this computation solely on the CPU. If one used the implementation of the A2 I presented two posts ago, he would notice that computing 4096 generations of a

    \[16\times 16\]

population takes roughly 0.154s. The CUDA implementation requires roughly 0.044s for the same work. I used a

nVidia Corporation GeForce 8600 GT (rev a1)

for the CUDA stuff and the

2Ghz Intel Core 2 Duo

for the CPU version. This stuff really is fun, but there really is one question left: what to do with all this power?

Louder now: A2

And no this is not about the Taking back Sunday EP :)

This is the sound I generated yesterday together with the corresponding cell states. Remember: the lower a tone, the less cells are alive and vice versa. So if one listens and watches very carefully, he can connect the sound to the pictures.
Just wanted to have this on YouTube

Listening to the A2

Short cut: Just listen here:
But if you want to know how this all came to existence, read on.

It got kinda boring today, so I decided to play arround with the A2 automaton once again. Once again I reimplemented the A2. This time in C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <stdlib.h>
#include <stdio.h>
typedef int uint_16;

int neighborhood_aliveness(uint_16 *m, int x, int y) {
    int s = 0, i = 0, j = 0;

    for(i = 0; i <= 2; i++)
        for(j = 0; j <= 2; j++)
            s += get_cell(m, (x + 1) - j, (y + 1) - i);
           
    return s;  
}

void set_cell(uint_16 *m, int x, int y, int val) {
    if(x > 15 || y > 15 || x < 0 || y < 0) return;
   
    if(val > 0) m[y] |= (1 << x);
    else m[y] &= ~(1 << x);
}

int get_cell(uint_16 *m, int x, int y) {
    return x > 15 || y > 15 || x < 0 || y < 0 ? 0 : (m[y] & (1 << x)) >> x;
}

uint_16* nextgen(uint_16* m) {
    uint_16 *ng = malloc(sizeof(uint_16) * 16);
   
    int i = 0,j = 0;
    for(i = 0; i < 16; i++) {
        for(j = 0; j < 16; j++) {
            set_cell(ng, j, i, neighborhood_aliveness(m, j, i) == 2 ? 1 : 0);
        }      
    }
    return ng;
}

void print_m(uint_16 *m) {
    int i = 0, j = 0;
    for(i = 0; i < 16; i++) {
        for(j = 0; j < 16; j++) {
            printf("%i", get_cell(m, j, i));
        }
        printf("\n");
    }  
    printf("\n");
}

int count_alive_cells(uint_16 *m) {
    int i = 0, j = 0, s = 0;
    for(i = 0; i < 16; i++) {
        for(j = 0; j < 16; j++) {
            s += get_cell(m, j, i);
        }
    }  
    return s;
}

int main(int argc, char **argv) {
    uint_16 *field = malloc(sizeof(uint_16) * 16);
    field[0] = 0xAAAA;
   
    int i = 0;
    for(i = 0; i < 32000; i++) {
        field = nextgen(field);
        printf("%i\t%i\n", i, count_alive_cells(field));
    }
   
    printf("\n");
    return 1;
}

It’s not a particullarly beautiful implementation, but it gets the job done. After this was settled I plotted the amount of alive cells per generation.
fig01
After doing some more rather standard stuff with data like this (like calculating standard derivation and plotting it: nothing fancy here) I noticed that the plot kinda reminded me of a wave pattern one can usually find in the sound editors. So, why not use the data the A2 gave us to generate some sort of music. No sooner said, than done. Once I had Super Collider downloaded and running, I started to do some tutorials and fiddling around with the massive and powerfull tool. Eventually I managed to come up with the following piece of code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(
s.recSampleFormat = 'int16';
s.recHeaderFormat = 'wav'
)

(
SynthDef(\help_sinegrain,
    { arg out=0, freq=440, sustain=0.5;
        var env;
        env = EnvGen.kr(Env.perc(0.05, sustain, 0.2), doneAction:2);
        Out.ar(out, SinOsc.ar(freq, 0, env))
    }).send(s);
)



(
a = Pseq(#[14, 24, 14, 24, 22, 34, 33, 32, 47, 48, 38, 43, 47, 55, 57, 59, 67, 58, 60, 53, 65, 68, 75, 67, 65, 63, 77, 53, 74, 79, 86, 52, 68, 84, 61, 55, 55, 58, 62, 71, 74, 66, 59, 66, 78, 68, 64, 73, 71, 52, 80, 66, 56, 64, 67, 59, 69, 75, 62, 74, 62, 99, 49, 62, 54, 74, 57, 63, 70, 71, 67, 69, 75, 57, 65, 85, 72, 71, 68, 78, 67, 60, 81, 70, 52, 70, 83, 82, 57, 65, 64, 68, 61, 73, 69, 78, 41, 56, 58, 64, 73, 66, 54, 74, 62, 68, 64, 84, 72, 56, 66, 73, 83, 54, 78, 45, 68, 68, 79, 70, 59, 59, 77, 71, 56, 73, 53, 83, 84, 52, 62, 77, 59, 87, 62, 70, 65, 74, 80, 62, 56, 64, 73, 71, 79, 56, 71, 66, 86, 70, 77, 70, 76, 51, 57, 51, 65, 45, 56, 47, 79, 58, 87, 71, 71, 60, 59, 88, 55, 53, 68, 70, 78, 62, 79, 57, 55, 67, 66, 61, 68, 80, 85, 45, 55, 73, 56, 77, 58, 54, 71, 71, 67, 65, 73, 78, 73, 69, 81, 54, 66], 1).asStream;
Routine({
    loop({
    Synth(\help_sinegrain, [\freq, a.next.midicps]);
    0.2.wait;
    });
}).play();
)

That

Pseq

in line 18 contains the first 200 samples from the A2 automaton. This sequence is then used as MIDI notes a few lines below. The outcome actually sounds pretty neat. I did not expect to be able to create such nice sounding stuff within this short amount of time. But anyway, the A2 is still interessting and Super Collider is super cool. Alright, I really do need to get some sleep right now. It’s like 4 o’clock in the moring and a fellow student of mine and myself want to start playing around with CUDA tomorow.

Playing around with Markov algorithms

This is just something quick I wanted to for about a week now: an interpreter for Markov algorithms. So I wrote two of them, one in Ruby to get the idea how to do this and one in Javascript to have it on the web. Both are not particularly beautiful written but get the job done. Click this neat little link to get to the online version or have a look at the Ruby code (as you can see the production system and input word are “hard coded”). In both versions the production system removes the trailing zeros from the input word.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
E = 0x00
P = [
    { "a1" => "1a" },
    { "a0" => "0a" },
    { "0a" => "a" },
    { "a" => [E] },
    { E => "a" }
]
I = "0101101001110110000"


word = I
while true do
    ruleToRun = nil
    P.each {|rule|
        rule = rule.to_a[0]
        score = word.index(rule[0])
        if (!score.nil? || rule[0] == E) && ruleToRun.nil?
            ruleToRun = rule       
        end
    }
   
    break if ruleToRun.nil?
    shouldStopAfterThisRule = ruleToRun[1].is_a? Array
    ruleToRun[1] = ruleToRun[1][0] if shouldStopAfterThisRule
    oldword = word
    if ruleToRun[0] == E
        word = ruleToRun[1] + word;
    elsif ruleToRun[1] == E
        word = word.sub(ruleToRun[0], "")
    else
        word = word.sub(ruleToRun[0], ruleToRun[1])
    end
    puts "#{ruleToRun[0]} -> #{(shouldStopAfterThisRule ? "." : "")}#{ruleToRun[1].to_s}: #{oldword} => #{word}"
    break if shouldStopAfterThisRule
end
puts word

Simulating fluid dripping on a chain

This is some little afternoon project I did today. It’s basically a chain moved by a rotating cylinder. The simulation really respects friction, forces and is pretty close to reality except that the “fluid” is not really fluid but are really really small particles. Everything’s written in Java using JBox2D and Processing (by utilising the testbed, actually modifying the chain example). What’s cool is that basically every parameter can be set. Those parameters are:

  • chain ball radius The radius of an element within the chain. This influences the speed of which a ball becomes fully colored.
  • amount of chain balls The amount of balls on the chain (this can be set to close or wide, which may cause the simulation to be unrealistic
  • bearing radius The size of the bearings
  • bearing distance The distance of both bearings. Together with amount of chain balls this modifies the distance between the balls
  • motor speed The speed of the motor. Basically one could also modify the motors tourge, but it’s set to 10kNm per default so that it doesn’t matter
  • injected particles per second
  • particle size This radius directly changes the speed in which the chain balls become “colored”. Actually the percentage (ranged from 0..1) is calculated using the following equation

        \[\frac{2}{4\pi\cdot r^2} \cdot \vert P_{recv}\vert \cdot \mbox{size}(p)\]

    where

        \[r\]

    is the radius of this chain ball,

        \[P_{recv}\]

    is the set of received particles for this chain ball and

        \[\mbox{size}(p)\]

    is the particle radius

  • particle velocity The norm of the velocity the injector produces. The velocity is then

        \[v\cdot \begin{pmatrix}0\\-1\end{pmatrix}\]

This was a nice little project and I’m looking forward to play around with this stuff a little more (once I’ve finished playing around with my graph theory studies). Below is a video of the simulation with the following values set:

Parameter Value
chain ball radius 0.5 cm
amount of chain balls 50
bearing radius 5.5 cm
bearing distance 20 cm
motor speed 1.5 cm/s
injected particles per second 50 particles/s
particle size 0.01 cm
particle velocity 1 cm/s

Next Page »
Fork me on GitHub