Haskell performance when using classes and instances

Question

The Problem

I want to simulate in Haskell a multivalue outputting functions. The Haskell code is generated (not hand written) - this is important information, see below:

This can be of course easly done by returning a tuple from function, like

f x y = (x+y, x-y)

But when using such function I have to know what kind of tuple it returns:

...
(out_f_1, out_f_2)          = f a b
(out_g_1, out_g_2, out_g_3) = g out_f_1
...

And so on ... But while generating code, I don't know what is the type of ouput of lets say f, so right now I'm using the Data.List.Select package and simulate the above with:

import Data.List.Select
...
out_f = f a b
out_g = g (sel1 outf)
...

The problem is the performance - on my testing program, the version, which uses Data.List.Select is twice slower than the version written by hand.

This is very obvious situation, because Data.List.Select is written using classes and instances, so it uses some kind of runtime dictionary (If I'm not wrong). (http://hackage.haskell.org/packages/archive/tuple/0.2.0.1/doc/html/src/Data-Tuple-Select.html#sel1)

The Question

I want to ask you If is it possible to somehow compile the version (which uses Data.List.Select) to be as fast as the manually crafted one?

I think there should be a switch to compiler, which will tell him to "instantiate" the classes and interfaces for each use (something like templates from C++).

Benchmarks

Test1.hs:

import qualified Data.Vector as V
import System.Environment
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
   putStrLn "Starting..."
   args <- getArgs
   let iternum = read (head args) :: Int in do
      putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
         $ V.enumFromTo 1 iternum
      putStrLn "Done."

compile with ghc -O3 Test1.hs

Test2.hs:

import qualified Data.Vector as V
import Data.Tuple.Select
import Data.Tuple.OneTuple

import System.Environment
b x = OneTuple $ x + 5
c x = OneTuple $ (sel1 $ b x) + 1
d x = OneTuple $ (sel1 $ b x) - 1
a x = OneTuple $ (sel1 $ c x) + (sel1 $ d x)
main = do
   putStrLn "Starting..."
   args <- getArgs
   let iternum = read (head args) :: Int in do
      putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> sel1 $ a (iternum-i))
         $ V.enumFromTo 1 iternum
      putStrLn "Done."

compile with ghc -O3 Test2.hs

Results

time ./Test1 10000000 = 5.54 s
time ./Test2 10000000 = 10.06 s

Wojciech Danilo · Accepted Answer

Ok, the results I've posted are not accurate - as @sabauma told - the two codes perform in the same time If you compile them with optimizations enabled.

The @tohava's answer is very good if you want to explicity show which functions to specialize (see the @sabauma comment above).

Haskell performance when using classes and instances

Answers (2)

Related Questions