Economic Interpreter for Haskell

Introduction

I needed a Haskell interpreter for evaluating students’ exercises.
It had to be fast because I use it online with AJAX like this (edit the expression and hit enter):

Test>
2 :: Integer

The most obvious choice was the hint Haskell interpreter, but I had to learn how to use it economically. In this post I write down what I learnt.

About hint

hint is quite easy to use, we have an Interpreter monad with basic actions like type inference, module imports and evaluation of strings as Haskell expressions. At the end we can run the Interpreter action in IO.

The interesting part is that there is an interpret function which interpret strings as Haskell expressions but does not evaluate them!

interpret :: Typeable a => String -> a -> Interpreter a

(I specialized the type of interpret to be more comprehensible.)
The second argument is a witness for the type system that the result type is not polymorph, it will not be used during the computation.

So we give a String in and we can tell what we would like to get: an Int, an [Int] or a Dynamic value. If there is no type error (no fail in the Interpreter monad), we get an unevaluated value of the given type. Finally this was the solution to my problem.

The First Try

First I started a new Interpreter action for each request in parallel. This was too slow for me because basically it started a new GHCi every time which have to load its libraries. Try time ghc -e "1+1" on your machine, it will tell you how slow it is.

I show you an ASCII picture about the situation (boxes are interpreter actions, = signs denote interpretation and evaluation of strings):

 ___________
|           |
|    [====] |
|___________|
     ___________
    |           |
    |    [====] |
    |___________|
   ___________________
  |                   |
  |    [============] |
  |___________________|

Batch Processing

Fortunately we can run any IO action in Interpreter with liftIO. I set up a channel and send the strings to the interpreter through the channel. The interpreter send back the results via mutable variables (these answer-variables are sent to the interpreter paired with the strings.)

 _______________________________
|                           
|    [====] [==]  [========] 
|_______________________________

This is a good solution but something is missing!

Many Requests

I had many strings to interpret in parallel and I wanted to interpret them separately to decrease response time.

First I started worker threads with a constantly running Interpreter action in each. I noticed that memory consumption grew linearly with the number of worker threads which was not good because Interpreter need quite a lot memory.

 _______________________________
|                           
|     [==]  [====] [======] 
|_______________________________
 _______________________________
|                           
|    [====]   [==] [========] 
|_______________________________
 _______________________________
|                           
|      [===] [=]   [======] 
|_______________________________

Economic Use

The final solution was straightforward.

I use only one worker thread, I send it the strings through the channel and it send back unevaluated Dynamic values! Finally I evaluate the Dynamic values in parallel (in one GHC runtime, so memory consumption does not grow that much).

The throughput of this architecture depends on the speed of parsing and typechecking of GHC which is quite fast for small strings.

 _______________________________
|Interpreter
|    [=] [=]  [=] [=] [=] 
|______|___|____|___|___|_______
       |   |    |   |   |
 GHC   |   |    |   |   [====]
 runtime   |    |   |
       |   |    |   [========]
       |   |    |
       |   |    [=====]
       |   |
       |   [============]
       |
       [=====]

Module reloads needs more time with hint so I try to avoid that in the worker thread; if the next request need exactly the same module imports then the environment will be not be cleared.

Experiences

I use the hint interpreter in a snap server on a 2 core 3.20GHz Intel Xeon CPU, on a 64 bit freeBSD operating system. This is not the only application which run on this machine. The machine has 2 GB RAM, but my application is allowed to use maximum 500 MB, and it typically use 200-300 MB memory.

So far it interpreted approximately half million strings, 3000 strings/hour during heaviest use so far. (I did not measure the upper limit.) The throughput could be increased with more worker threads (but then memory consumption will increase too). I have no statistics about response times but it should be under 0.1s on the average.

Source Code

The source code will be released on HackageDB with the server application, stay tuned!

Back to the main page

Main page