Shoe
Shoe

Reputation: 76240

What container really mimics std::vector in Haskell?

The problem

I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one. This means that the size of the container, at the end, will always be n.

Each element, i, of the container depends on at least 2 and up to 4 previous results.

The container have to provide:

or alternatively (given a O(n) initialization):

What is std::vector and why is it relevant

For those of you who don't know C++, std::vector is a dynamically sized array. It is a perfect fit for this problem because it is able to:

Therefore this problem is solvable in O(n) complexity, in C++.

Why Data.Vector is not std::vector

Data.Vector, together with Data.Array, provide similar functionality to std::vector, but not quite the same. Both, of course, offer constant time indexing in the middle, but they offer neither constant time modification ((//) for example is at least O(n)) nor constant time insertion at either beginning of end.

Conclusion

What container really mimics std::vector in Haskell? Alternatively, what is my best shot?

Upvotes: 17

Views: 2156

Answers (4)

Thomas
Thomas

Reputation: 11888

I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one.

Each element, i, of the container depends on at least 2 and up to 4 previous results.

Lets consider a very small program. that calculates fibonacci numbers.

fib 1 = 1
fib 2 = 1 
fib n = fib (n-1) + fib (n-2)

This is great for small N, but horrible for n > 10. At this point, you stumble across this gem:

fib n = fibs !! n where fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

You may be tempted to exclaim that this is dark magic (infinite, self referential list building and zipping? wth!) but it is really a great example of tying the knot, and using lazyness to ensure that values are calcuated as-needed.

Similarly, we can use an array to tie the knot too.

import Data.Array

fib n = arr ! 10
  where arr :: Arr Int Int
        arr = listArray (1,n) (map fib' [1..n])
        fib' 1 = 1
        fib' 2 = 1
        fib' n = arr!(n-1) + arr!(n-2)

Each element of the array is a thunk that uses other elements of the array to calculate it's value. In this way, we can build a single array, never having to perform concatenation, and call out values from the array at will, only paying for the calculation up to that point.

The beauty of this method is that you don't only have to look behind you, you can look in front of you as well.

Upvotes: 2

rampion
rampion

Reputation: 89053

From reddit comes the suggestion to use Data.Vector.constructN:

O(n) Construct a vector with n elements by repeatedly applying the generator function to the already constructed part of the vector.

 constructN 3 f = let a = f <> ; b = f <a> ; c = f <a,b> in f <a,b,c>

For example:

λ import qualified Data.Vector as V
λ V.constructN 10 V.length
fromList [0,1,2,3,4,5,6,7,8,9]
λ V.constructN 10 $ (1+) . V.sum
fromList [1,2,4,8,16,32,64,128,256,512]
λ V.constructN 10 $ \v -> let n = V.length v in if n <= 1 then 1 else (v V.! (n - 1)) + (v V.! (n - 2))
fromList [1,1,2,3,5,8,13,21,34,55]

This certainly seems to qualify to solve the problem as you've described it above.

Upvotes: 12

Boyd Stephen Smith Jr.
Boyd Stephen Smith Jr.

Reputation: 3202

When looking for functional containers with particular asymptotic run times, I always pull out Edison.

Note that there's a result that in a strict language with immutable data structures, there's always a logarithmic slowdown to implementing mutable data structure on top of them. It's an open problem whether the limited mutation hidden behind laziness can avoid that slowdown. There also the issue of persistent vs. transient...

Okasaki is still a good read for background, but finger trees or something more complex like an RRB-tree should be available "off-the-shelf" and solve your problem.

Upvotes: 3

epsilonhalbe
epsilonhalbe

Reputation: 15967

The first data structures that come to my mind are either Maps from Data.Map or Sequences from Data.Sequence.

Update

Data.Sequence

Sequences are persistent data structures that allow most operations efficient, while allowing only finite sequences. Their implementation is based on finger-trees, if you are interested. But which qualities does it have?

  • O(1) calculation of the length
  • O(1) insert at front/back with the operators <| and |> respectively.
  • O(n) creation from a list with fromlist
  • O(log(min(n1,n2))) concatenation for sequences of length n1 and n2.
  • O(log(min(i,n-i))) indexing for an element at position i in a sequence of length n.

Furthermore this structure supports a lot of the known and handy functions you'd expect from a list-like structure: replicate, zip, null, scans, sort, take, drop, splitAt and many more. Due to these similarities you have to do either qualified import or hide the functions in Prelude, that have the same name.

Data.Map

Maps are the standard workhorse for realizing a correspondence between "things", what you might call a Hashmap or associave array in other programming languages are called Maps in Haskell; other than in say Python Maps are pure - so an update gives you back a new Map and does not modify the original instance.

Maps come in two flavors - strict and lazy.

Quoting from the Documentation

Strict

API of this module is strict in both the keys and the values.

Lazy

API of this module is strict in the keys, but lazy in the values.

So you need to choose what fits best for your application. You can try both versions and benchmark with criterion.

Instead of listing the features of Data.Map I want to pass on to

Data.IntMap.Strict

Which can leverage the fact that the keys are integers to squeeze out a better performance Quoting from the documentation we first note:

Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).

So what are the characteristics for IntMaps

  • O(min(n,W)) for (unsafe) indexing (!), unsafe in the sense that you will get an error if the key/index does not exist. This is the same behavior as Data.Sequence.
  • O(n) calculation of size
  • O(min(n,W)) for safe indexing lookup, which returns a Nothing if the key is not found and Just a otherwise.
  • O(min(n,W)) for insert, delete, adjust and update

So you see that this structure is less efficient than Sequences, but provide a bit more safety and a big benefit if you actually don't need all entries, such the representation of a sparse graph, where the nodes are integers.

For completeness I'd like to mention a package called persistent-vector, which implements clojure-style vectors, but seems to be abandoned as the last upload is from (2012).

Conclusion

So for your use case I'd strongly recommend Data.Sequence or Data.Vector, unfortunately I don't have any experience with the latter, so you need to try it for yourself. From the stuff I know it provides a powerful thing called stream fusion, that optimizes to execute multiple functions in one tight "loop" instead of running a loop for each function. A tutorial for Vector can be found here.

Upvotes: 9

Related Questions