Using vectors for performance improvement in Haskell

Question

I'm very new to Haskell, and I have a question about what performance improvements can be had by using impure (mutable) data structures. I'm trying to piece together a few different things I've heard, so please bear with me if my terminology is not entirely correct, or if there are some small errors.

To make this concrete, consider the quicksort algorithm (taken from the Haskell wiki).

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

This is not "true quicksort." A "true" quicksort algorithm is in-place, and this is not. This is very memory inefficient.

On the other hand, it is possible to use vectors in Haskell to implement an in-place quicksort. An example is given in this stackoverflow answer.

How much faster is the second algorithm than the first? Big O notation doesn't help here, because the performance improvement is going to be from using memory more efficiently, not having a better algorithm (right?). I tired to construct some test cases on my own, but I had difficult getting things running.

An ideal answer would give some idea of what makes the in-place Haskell algorithm faster theoretically, and an example comparison of running times on some test data set.

leftaroundabout · Accepted Answer

There's nothing better than a test, right? And the results are not unsurprising: for lists of random integers in range [0 .. 1000000],

list size: 200000         ghc              -O2     -fllvm  -fllvm-O2
────────                   ────────   ────────   ────────   ────────
Data.List.sort            0.878969s  0.883219s  0.878106s  0.888758s
Naïve.quicksort           0.711305s  0.870647s  0.845508s  0.919925s
UArray_IO.quicksort       9.317783s  1.919583s  9.390687s  1.945072s
Vector_Mutable.quicksort   1.48142s  0.823004s  1.526661s  0.806837s

Here, Data.List.sort is just what it is, Naïve.quicksort is the algorithm you quoted, UArray_IO.quicksort and Vector_Mutable.quicksort are taken from the question you linked to: klapaucius' and Dan Burton's answer which turn out to be very suboptimal performance-wise, see what better Daniel Fischer could do it, both wrapped so as to accept lists (not sure if I got this quite right):

quicksort :: [Int] -> [Int]
quicksort l = unsafePerformIO $ do
  let bounds = (0, length l)
  arr <- newListArray bounds l :: IO (IOUArray Int Int)
  uncurry (qsort arr) bounds
  getElems arr

and

quicksort :: Ord a => [a] -> [a]
quicksort = toList . iqsort . fromList

respectively.

As you can see, the naïve algorithm is not far behind the mutable solution with Data.Vector in terms of speed for sorting a list of random-generated integers, and the IOUArray is actually much worse. Test was carried out on an Intel i5 laptop running Ubuntu 11.10 x86-64.

The following doesn't really make much sense considering that ɢᴏᴏᴅ mutable implementations are, after all, still well ahead of all those compared here.

Note that this does not mean that a nice list-based program can always keep up with its mutably-implemented equivalents, but GHC sure does a great job at bringing the performance close. Also, it depends of course on the data: these are the times when the random-generated lists to sort contain values in between 0 and 1000 rather than 0 an 1000000 as above, i.e. with many duplicates:

list size: 200000         ghc               -O2      -fllvm  -fllvm-O2
────────                    ────────   ────────    ────────   ────────
Data.List.sort             0.864176s  0.882574s   0.850807s  0.857957s
Naïve.quicksort            1.475362s  1.526076s   1.475557s  1.456759s
UArray_IO.quicksort       24.405938s  5.255001s  23.561911s  5.207535s
Vector_Mutable.quicksort   3.449168s  1.125788s   3.202925s  1.117741s

Not to speak of pre-sorted arrays.

What's quite interesting, (becomes only apparent with really large sizes, which require rtsopts to increase the stack capacity), is how both mutable implementations become significantly slower with -fllvm -O2:

list size: 3⋅10⁶        ghc      -O1   -fllvm-O1         -O2   -fllvm-O2
────────                    ────────    ────────    ────────    ────────
Data.List.sort            23.897897s  24.138117s  23.708218s  23.631968s
Naïve.quicksort           17.068644s  19.547817s  17.640389s  18.113622s
UArray_IO.quicksort       35.634132s  38.348955s  37.177606s  49.190503s
Vector_Mutable.quicksort  17.286982s  17.251068s  17.361247s  36.840698s

It seems kind of logical to me that the immutable implementations fare better on llvm (doesn't it do everything immutably on some level?), though I don't understand why this only becomes apparent as a slowdown to the mutable versions at high optimisation and large data sizes.

Testing program:

$ cat QSortPerform.hs
module Main where

import qualified Data.List(sort)
import qualified Naïve
import qualified UArray_IO
import qualified Vector_Mutable

import Control.Monad
import System.Random
import System.Environment

sortAlgos :: [ (String, [Int]->[Int]) ]
sortAlgos = [ ("Data.List.sort", Data.List.sort)
            , ("Naïve.quicksort", Naïve.quicksort)
            , ("UArray_IO.quicksort", UArray_IO.quicksort)
            , ("Vector_Mutable.quicksort", Vector_Mutable.quicksort) ]

main = do
   args <- getArgs
   when (length args /= 2) $ error "Need 2 arguments"

   let simSize = read $ args!!1
   randArray <- fmap (take simSize . randomRs(0,1000000)) getStdGen

   let sorted = case filter ((== args!!0) . fst) sortAlgos of
        [(_, algo)] -> algo randArray
        _ -> error $ "Argument must be one of " 
                        ++ show (map fst sortAlgos)

   putStr "First element:  "; print $ sorted!!0
   putStr "Middle element: "; print $ sorted!!(simSize`div`2)
   putStr "Last element:   "; print $ sorted!!(simSize-1)

which takes the algorithm name and array size on command-line. Runtime comparison was done with this program:

$ cat PerformCompare.hs
module Main where

import System.Process
import System.Exit
import System.Environment
import Data.Time.Clock
import Data.List
import Control.Monad
import Text.PrettyPrint.Boxes

compiler = "ghc"
testProgram = "./QSortPerform"
flagOpts = [[], ["-O2"], ["-fllvm"], ["-fllvm","-O2"]]
algos = ["Data.List.sort","Naïve.quicksort","UArray_IO.quicksort","Vector_Mutable.quicksort"]


main = do
   args <- getArgs
   let testSize = case args of
         [numb] -> read numb
         _      -> 200000

   results <- forM flagOpts $ \flags -> do

      compilerExitC <- verboseSystem
              compiler $ testProgram : "-fforce-recomp" : flags
      when (compilerExitC /= ExitSuccess) .
         error $ "Compiler error \"" ++ show compilerExitC ++"\""

      algoCompare <- forM algos $ \algo -> do
         startTime <- getCurrentTime
         exitC <- verboseSystem testProgram [algo, show testSize]
         endTime <- getCurrentTime
         when (exitC /= ExitSuccess) .
            error $ "Program error \"" ++ show exitC ++"\""
         return . text . show $ diffUTCTime endTime startTime

      return . vcat right $ text(concat flags)
                          : text("────────")
                          : algoCompare

   let table = hsep 2 bottom
         $ vcat left (map text $ ("list size: "++show testSize)
                               : "────────"
                               : algos                          )
         : results

   printBox table



verboseSystem :: String -> [String] -> IO ExitCode
verboseSystem cmd args = do
   putStrLn . unwords $ cmd : args
   rawSystem cmd args

Using vectors for performance improvement in Haskell

Answers (2)

Testing program:

Related Questions