Reputation: 2871

Non-monolithic arrays in Haskell

I have accepted an answer to the question below, but It seemed I misunderstood how Arrays in haskell worked. I thought they were just beefed up lists. Keep that in mind when reading the question below.

I've found that monolithic arrays in haskell are quite inefficient when using them for larger arrays.

I haven't been able to find a non-monolithic implementation of arrays in haskell. What I need is O(1) time look up on a multidimensional array.

Is there an implementation of of arrays that supports this?

EDIT: I seem to have misunderstood the term monolithic. The problem is that it seems like the arrays in haskell treats an array like a list. I might be wrong though.

EDIT2: Short example of inefficient code:

fibArray n = a where
  bnds = (0,n)
  a = array bnds [ (i, f i) | i <- range bnds ]
  f 0 = 0
  f 1 = 1
  f i = a!(i-1) + a!(i-2)

this is an array of length n+1 where the i'th field holds the i'th fibonacci number. But since arrays in haskell has O(n) time lookup, it takes O(n²) time to compute.

Upvotes: 2

Answers (3)

Will Ness

Reputation: 71119

Referring specifically to your fibArray example, try this and see if it speeds things up a bit:

-- gradually calculate m-th item in steps of k
--     to prevent STACK OVERFLOW , etc
gradualth m k arr                         
    | m <= v = pre `seq` arr!m   
  where                                   
    pre = foldl1 (\a b-> a `seq` arr!b) [u,u+k..m]
    (u,v) = bounds arr

For me, for let a=fibArray 50000, gradualth 50000 10 aran at 0.65 run time of just calling a!50000 right away.

Upvotes: 1

Don Stewart

Reputation: 137997

You're confusing linked lists in Haskell with arrays.

Linked lists are the data types that use the following syntax:

[1,2,3,5]

defined as:

data [a] = [] | a : [a]

These are classical recursive data types, supporting O(n) indexing and O(1) prepend.

If you're looking for multidimensional data with O(1) lookup, instead you should use a true array or matrix data structure. Good candidates are:

Repa - fast, parallel, multidimensional arrays -- (Tutorial)
Vector - An efficient implementation of Int-indexed arrays (both mutable and immutable), with a powerful loop optimisation framework . (Tutorial)
HMatrix - Purely functional interface to basic linear algebra and other numerical computations, internally implemented using GSL, BLAS and LAPACK.

Upvotes: 8

John L

Reputation: 28097

Arrays have O(1) indexing. The problem is that each element is calculated lazily. So this is what happens when you run this in ghci:

*Main> :set +s
*Main> let t = 100000
(0.00 secs, 556576 bytes)
*Main> let a = fibArray t
Loading package array-0.4.0.0 ... linking ... done.
(0.01 secs, 1033640 bytes)
*Main> a!t  -- result omitted
(1.51 secs, 570473504 bytes)
*Main> a!t  -- result omitted
(0.17 secs, 17954296 bytes)
*Main>

Note that lookup is very fast, after it's already been looked up once. The array function creates an array of pointers to thunks that will eventually be calculated to produce a value. The first time you evaluate a value, you pay this cost. Here are a first few expansions of the thunk for evaluating a!t:

a!t -> a!(t-1)+a!(t-2)-> a!(t-2)+a!(t-3)+a!(t-2) -> a!(t-3)+a!(t-4)+a!(t-3)+a!(t-2)

It's not the cost of the calculations per se that's expensive, rather it's the need to create and traverse this very large thunk.

I tried strictifying the values in the list passed to array, but that seemed to result in an endless loop.

One common way around this is to use a mutable array, such as an STArray. The elements can be updated as they're available during the array creation, and the end result is frozen and returned. In the vector package, the create and constructN functions provide easy ways to do this.

-- constructN :: Unbox a => Int -> (Vector a -> a) -> Vector a


import qualified Data.Vector.Unboxed as V
import Data.Int

fibVec :: Int -> V.Vector Int64
fibVec n = V.constructN (n+1) c
 where
  c v | V.length v == 0 = 0 
  c v | V.length v == 1 = 1 
  c v | V.length v == 2 = 1
  c v = let len = V.length v
        in v V.! (len-1) + v V.! (len-2)

BUT, the fibVec function only works with unboxed vectors. Regular vectors (and arrays) aren't strict enough, leading back to the same problem you've already found. And unfortunately there isn't an Unboxed instance for Integer, so if you need unbounded integer types (this fibVec has already overflowed in this test) you're stuck with creating a mutable array in IO or ST to enable the necessary strictness.

Upvotes: 5

Non-monolithic arrays in Haskell

Answers (3)

Related Questions