Performance : computing in Haskell using Vector (a,b) versus Map a b

Question

I want to play around with text statistics, comparing texts pairwise by looking at relative frequencies of words in them (typically by computing the sum of absolute values of differences). This is O(n^2) in the number of texts, so precomputation within each text is ok. My question is about how to represent such statistics. I have tried two ways:

Vector (T.Text,Double) sorted by hand (during precomputation), and given two such vectors, compute the sum by a recursive function. Kind of zip keeping track of alignment of the first element of the pair, followed by a fold.
Map T.Text Double and then cooking up the same thing using mergeWithKey (\k x y -> Just abs (x-y)) id id with a foldl' (+) 0 on top.

The second way is much more expressive, because a Map is essentially what text statistics really are, and the code is much shorter. But on the other hand the Vector is about 3 times faster, at the cost of a lot of verbosity, and somehow it feels wrong, like a naive implementation of a Map. Of course it misses all the fancy insert / update / whatever, but I don't need that.

Am I missing something here, like a third data structure that would be better for the task?

Performance : computing in Haskell using Vector (a,b) versus Map a b

Answers (1)

Related Questions