MathematicalOrchid
MathematicalOrchid

Reputation: 62818

Speed up runhaskell

I have a small test framework. It executes a loop which does the following:

  1. Generate a small Haskell source file.

  2. Execute this with runhaskell. The program generates various disk files.

  3. Process the disk files just generated.

This happens a few dozen times. It turns out that runhaskell is taking up the vast majority of the program's execution time.

On one hand, the fact that runhaskell manages to load a file from disk, tokenise it, parse it, do dependency analysis, load 20KB more text from disk, tokenise and parse all of this, perform complete type inference, check types, desugar to Core, link against compiled machine code, and execute the thing in an interpreter, all inside of 2 seconds of wall time, is actually pretty damned impressive when you think about it. On the other hand, I still want to make it go faster. ;-)

Compiling the tester (the program that runs the above loop) produced a tiny performance difference. Compiling the 20KB of library code that the scripts link against produced a rather more noticeable improvement. But it's still taking about 1 second per invocation of runhaskell.

The generated Haskell files are just over 1KB each, but only one part of the file actually changes. Perhaps compiling the file and using GHC's -e switch would be faster?

Alternatively, maybe it's the overhead of repeatedly creating and destroying many OS processes which is slowing this down? Every invocation of runhaskell presumably causes the OS to explore the system search path, locate the necessary binary file, load it into memory (surely this is already in the disk cache?), link it against whatever DLLs, and fire it up. Is there some way I can (easily) keep one instance of GHC running, rather than having to constantly create and destroy the OS process?

Ultimately, I suppose there's always the GHC API. But as I understand it, that's nightmarishly difficult to use, highly undocumented, and prone to radical changes at every minor point release of GHC. The task I'm trying to perform is only very simple, so I don't really want to make things more complex than necessary.

Suggestions?

Update: Switching to GHC -e (i.e., now everything is compiled except the one expression being executed) made no measurable performance difference. It seems pretty clear at this point that it's all OS overhead. I'm wondering if I could maybe create a pipe from the tester to GHCi and thus make use of just one OS process...

Upvotes: 15

Views: 2084

Answers (5)

Peter Gammie
Peter Gammie

Reputation: 31

You might find some useful code in TBC. It has different ambitions - in particular to scrap test boilerplate and test projects that may not compile completely - but it could be extended with a watch-directory feature. The tests are run in GHCi but objects successfully built by cabal ("runghc Setup build") are used.

I developed it to test EDSLs with complicated type hackery, i.e. where the heavy computational lifting is done by other libraries.

I am presently updating it to the latest Haskell Platform and welcome any comments or patches.

Upvotes: 3

Tener
Tener

Reputation: 5279

If calling runhaskell takes so much time then perhaps you should eliminate it completely?

If you really need to work with changing Haskell code then you can try the following.

  1. Create a set of modules with varying contents as needed.
  2. Each module should export it's main function
  3. Additional wrapper module should execute the right module from the set based on input arguments. Each time you want to execute a single test you would use a different arguments.
  4. The whole program is compiled statically

Example module:

module Tester where

import Data.String.Interpolation -- package Interpolation

submodule nameSuffix var1 var2 = [str|
module Sub$nameSuffix$ where

someFunction x = $var1$ * x
anotherFunction v | v == $var2$ = v
                  | otherwise = error ("anotherFunction: argument is not " ++ $:var2$)

|]

modules = [ let suf = (show var1 ++ "_" ++ show var2)  in (suf,submodule suf var1 var2) | var1 <- [1..10], var2 <- [1..10]]

writeModules = mapM_ (\ (file,what) -> writeFile file what) modules

Upvotes: 0

Heatsink
Heatsink

Reputation: 7751

If the tests are well isolated from one another, you can put all the test code into a single program and invoke runhaskell once. This may not work if some tests are created based on the results of others, or if some tests call unsafeCrash.

I presume your generated code looks like this

module Main where
boilerplate code
main = do_something_for_test_3

You can put the code of all the tests into one file. Each test code generator is responsible for writing do_something_for_test_N.

module Main where
boilerplate code

-- Run each test in its own directory
withTestDir d m = do
  cwd <- getCurrentDirectory
  createDirectory d
  setCurrentDirectory d
  m
  setCurrentDirectory cwd

-- ["test1", "test2", ...]
dirNames = map ("test"++) $ map show [1..] 
main = zipWithM withTestDir dirNames tests

-- Put tests here
tests =
  [ do do_something_for_test_1
  , do do_something_for_test_2
  , ...
  ]

Now you only incur the overhead of a single call to runhaskell.

Upvotes: 0

MathematicalOrchid
MathematicalOrchid

Reputation: 62818

Alright, I have a solution: I created a single GHCi process and connected its stdin to a pipe, so that I can send it expressions to interactively evaluate.

Several fairly large program refactorings later, and the entire test suite now takes roughly 8 seconds to execute, rather than 48 seconds. That'll do for me! :-D

(To anyone else trying to do this: For the love of God, remember to pass the -v0 switch to GHCi, or you'll get a GHCi welcome banner! Weirdly, if you run GHCi interactively, even with -v0 the command prompt still appears, but when connected to a pipe the command prompt vanishes; I'm presuming this is a helpful design feature rather than an random accident.)


Of course, half the reason I'm going down this strange route is that I want to capture stdout and stderr to a file. Using RunHaskell, that's quite easy; just pass the appropriate options when creating the child process. But now all of the test cases are being run by a single OS process, so there's no obvious way to redirect stdin and stdout.

The solution I came up with was to direct all test output to a single file, and between tests have GHCi print out a magic string which (I hope!) won't appear in test output. Then quit GHCi, slurp up the file, and look for the magic strings so I can snip the file into suitable chunks.

Upvotes: 9

ivanm
ivanm

Reputation: 3927

If the majority of the source files remain unchanged, you can possibly use GHC's -fobject-code (possibly in conjunction with -outputdir) flag to compile some of the library files.

Upvotes: 2

Related Questions