mike3996
mike3996

Reputation: 17497

When to use `zipmap` and when `map vector`?

I was asking about the peculiarity of zipmap construct to only discover that I was apparently doing it wrong. So I learned about (map vector v u) in the process. But prior to this case I had used zipmap to do (map vector ...)'s work. Did it work then because the resultant map was small enough to be sorted out?

And to the actual question: what use zipmap has, and how/when to use it. And when to use (map vector ...)?

My original problem required the original order, so mapping anything wouldn't be a good idea. But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.

(for [pair (map vector v (rest v))]
  ( ... )) ;do with (first pair) and (last pair)

(for [pair (zipmap v (rest v))]
 ( ... )) ;do with (first pair) and (last pair)

Upvotes: 25

Views: 16317

Answers (4)

Daniel Canas
Daniel Canas

Reputation: 956

The two may appear similar but in reality are very different.

  • zipmap creates a map
  • (map vector ...) creates a LazySeq of n-tuples (vectors of size n)

These are two very different data structures. While a lazy sequence of 2-tuples may appear similar to a map, they behave very differently.

Say we are mapping two collections, coll1 and coll2. Consider the case coll1 has duplicate elements. The output of zipmap will only contain the value corresponding to the last appearance of the duplicate keys in coll1. The output of (map vector ...) will contain 2-tuples with all values of the duplicate keys.

A simple REPL example:

=> (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])
{:k3 3, :k2 2, :k1 4}

=>(map vector [:k1 :k2 :k3 :k1] [1 2 3 4])
([:k1 1] [:k2 2] [:k3 3] [:k1 4])

With that in mind, it is trivial to see the danger in assuming the following:

But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.

The seq'd map becomes a sequence of vectors, but not necessarily the same sequence of vectors as the results from (map vector ...)

For completeness, here are the seq'd vectors sorted:

=> (sort (seq (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 4] [:k2 2] [:k3 3])

=> (sort (seq (map vector [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 1] [:k1 4] [:k2 2] [:k3 3])

I think the closest we can get to a statement like the above is:

The set of the result of (zip map coll1 coll2) will be equal to the set of the result of (map vector coll1 coll2) if coll1 is itself set.

That is a lot of qualifiers for two operations that are supposedly very similar. That is why special care must be taken when deciding which one to use. They are very different, serve different purposes and should not be used interchangeably.

Upvotes: 4

zmila
zmila

Reputation: 1660

(zipmap k v) takes two seqs and returns map (and not preserves order of elements)

(map vector s1 s2 ...) takes any count of seqs and returns seq

use the first, when you want to zip two seqs into a map.

use the second, when you want to apply vector (or list or any other seq-creating form) to multiple seqs.

there is some similarity to option "collate" when you print several copies of a document :)

Upvotes: 3

mikera
mikera

Reputation: 106351

Use (zipmap ...) when you want to directly construct a hashmap from separate sequences of keys and values. The output is a hashmap:

(zipmap [:k1 :k2 :k3] [10 20 40])
=> {:k3 40, :k2 20, :k1 10}

Use (map vector ...) when you are trying to merge multiple sequences. The output is a lazy sequence of vectors:

(map vector [1 2 3] [4 5 6] [7 8 9])
=> ([1 4 7] [2 5 8] [3 6 9])

Some extra notes to consider:

  • Zipmap only works on two input sequences (keys + values) whereas map vector can work on any number of input sequences. If your input sequences are not key value pairs then it's probably a good hint that you should be using map vector rather than zipmap
  • zipmap will be more efficient and simpler than doing map vector and then subsequently creating a hashmap from the key/value pairs - e.g. (into {} (map vector [:k1 :k2 :k3] [10 20 40])) is quite a convoluted way to do zipmap
  • map vector is lazy - so it brings a bit of extra overhead but is very useful in circumstances where you actually need laziness (e.g. when dealing with infinite sequences)
  • You can do (seq (zipmap ....)) to get a sequence of key-value pairs rather like (map vector ...), however be aware that this may re-order the sequence of key-value pairs (since the intermediate hashmap is unordered)

Upvotes: 39

Maurits Rijk
Maurits Rijk

Reputation: 9985

The methods are more or less equivalent. When you use zipmap you get a map with key/value pairs. When you iterate over this map you get [key value] vectors. The order of the map is however not defined. With the 'map' construct in your first method you create a list of vectors with two elements. The order is defined.

Zipmap might be a bit less efficient in your example. I would stick with the 'map'.

Edit: Oh, and zipmap isn't lazy. So another reason not to use it in your example.

Edit 2: use zipmap when you really need a map, for example for fast random key-based access.

Upvotes: 7

Related Questions