Reputation: 5063
I am kind of new using data structures in haskell besides of Lists. My goal is to chose one container among Data.Vector
, Data.Sequence
, Data.List
, etc ... My problem is the following:
I have to create a sequence (mathematically speaking). The sequence starts at 0. In each iteration two new elements are generated but only one should be appended based in whether the first element is already in the sequence. So in each iteration there is a call to elem
function (see the pseudo-code below).
appendNewItem :: [Integer] -> [Integer]
appendNewItem acc = let firstElem = someFunc
secondElem = someOtherFunc
newElem = if firstElem `elem` acc
then secondElem
else firstElem
in acc `append` newElem
sequenceUptoN :: Int -> [Integer]
sequenceUptoN n = (iterate appendNewItem [0]) !! n
Where append
and iterate
functions vary depending on which colection you use (I am using lists in the type signature for simplicity).
The question is: Which data structure should I use?. Is Data.Sequence
faster for this task because of the Finger Tree inner structure?
Thanks a lot!!
Upvotes: 3
Views: 752
Reputation: 1590
I suggest using Data.Sequence
in conjunction with Data.Set
. The Sequence
to hold the sequence of values and the Set
to track the collection.
Sequence
, List
, and Vector
are all structures for working with values where the position in the structure has primary importance when it comes to indexing. In lists we can manipulate elements at the front efficiently, in sequences we can manipulate elements based on the log of the distance the closest end, and in vectors we can access any element in constant time. Vectors however, are not that useful if the length keeps changing, so that rules out their use here.
However, you also need to lookup a certain value within the list, which these structures don't help with. You have to search the whole of a list/sequence/vector to be certain that a new value isn't present. Data.Map
and Data.Set
are two of the structures for which you define an index value based on Ord
, and let you lookup/insert in log(n)
. So, at the cost of memory usage you can lookup the presence of firstElem
in your Set in log(n)
time and then add newElem
to the end of the sequence in constant time. Just make sure to keep these two structures in synch when adding or taking new elements.
Upvotes: 4
Reputation: 120711
No, sequences are not faster for searching. A Vector
is just a flat chunk of memory, which gives generally the best lookup performance. If you want to optimise searching, use Data.Vector.Unboxed
. (The normal, “boxed” variant is also pretty good, but it actually contains only references to the elements in the flat memory-chunk, so it's not quite as fast for lookups.)
However, because of the flat memory layout, Vector
s are not good for (pure-functional) appending: basically, whenever you add a new element, the whole array must be copied so as to not invalidate the old one (which somebody else might still be using). If you need to append, Seq
is a pretty good choice, although it's not as fast as destructive appending: for maximum peformance, you'll want to pre-allocate an uninitialized Data.Vector.Unboxed.Mutable.MVector
of the required size, populate it using the ST
monad, and freeze the result. But this is much more fiddly than purely-functional alternatives, so unless you need to squeeze out every bit of performance, Data.Sequence
is the way to go. If you only want to append, but not look up elements, then a plain old list in reverse order would also do the trick.
Upvotes: 6