leedm777
leedm777

Reputation: 24062

Improvements for a Map merge function

I'm writing a function to merge two Map's together. This is what I have so far:

def merge[K, V1, V2, V3](left: Map[K, V1], right: Map[K, V2])
    (fn: (Option[V1], Option[V2]) => V3): Map[K, V3] = {
  val r = (left.keySet ++ right.keySet) map {
    key =>
      (key -> fn(left.get(key), right.get(key)))
  }
  r.toMap
}

The function itself works. You use the function as so:

val m1 = Map(1 -> "one", 3 -> "three", 5 -> "five")
val m2 = Map(1 -> "I", 5 -> "V", 10 -> "X")
merge(m1, m2) { (_, _) } 
// returns: 
// Map(1 -> (Some(one),Some(I)), 
//     3 -> (Some(three),None), 
//     5 -> (Some(five),Some(V)), 
//     10 -> (None,Some(X)))

I have two questions:

  1. I'm worried about the performance computational complexity of the .get and .toMap calls. Can anyone improve the implementation?
  2. I'd like the default function to make a pair of the values ({ (_, _) }). I can't quite get the syntax to do so properly.

Edit: While I originally said performance, I meant computational complexity. My guess is that this function performs in O(n•ln(n)) time. Looks like my function performs roughly in O(n). Can it be done in O(ln(n))?

Upvotes: 4

Views: 873

Answers (2)

huynhjl
huynhjl

Reputation: 41646

For the default function literal use:

(fn: (Option[V1], Option[V2]) => V3 = 
  (x: Option[V1], y: Option[V2]) => Tuple2(x,y))

You'll have to use merge like this: merge(m1,m2)()

I would say don't be worried about performance until you perform some measurements on actual data.

Edit: about performance, by providing a view instead of constructing a map you can get quick "construction" at the expense of lookup - assuming we are dealing with immutable maps. So depending on actual data and use case, you can get better performance for certain operations, but it has a trade-off.

class MergedView[K, V1, V2, V3](
    left: Map[K, V1], right: Map[K, V2]
  )(fn: (Option[V1], Option[V2]) => V3 = (x: Option[V1], y: Option[V2]) => Tuple2(x,y)
  ) extends collection.DefaultMap[K, V3] {
  def get(key: K): Option[V3] = (left.get(key), right.get(key)) match {
    case (None, None) => None
    case t => Some(fn(t._1, t._2))
  }
  lazy val tuples = (left.keys ++ right.keys).map(key => key -> get(key).get)
  def iterator: Iterator[(K, V3)] = tuples.iterator
} 

val r1 = new MergedView(m1, m2)() // use parens here for second param list.

Upvotes: 3

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297275

You shouldn't worry about get -- yes, it will create a wrapper object, but doing anything else will be awkward enough that you shouldn't try unless a profiler shows that to be a problem.

As for toMap, yes, that might well slow you down. You may try using breakOut.

Regarding complexity of get and toMap, lookup and add are effective constant time for immutable HashMap, which is the default Map. See Performance Characteristics of Scala Collections.

Upvotes: 1

Related Questions