Matthew Moisen
Matthew Moisen

Reputation: 18279

In which types of use cases is MapReduce superior to Spark?

I just attended a introductory class on Spark and asked the speaker if Spark could fully replace MapReduce, and was told that Spark could be used in replace of MapReduce for any use case, but there are particular use cases that MapReduce is actually faster than Spark.

What are the characteristics of the use cases that MapReduce can solve faster than Spark?

Upvotes: 2

Views: 246

Answers (1)

Sean Owen
Sean Owen

Reputation: 66876

Pardon me for quoting myself from Quora, but:

  • For the data-parallel, one-pass, ETL-like jobs MapReduce was designed for, MapReduce is lighter-weight compared to the Spark equivalent
  • Spark is fairly mature, and so is YARN now, but Spark-on-YARN is still pretty new. The two may not be optimally integrated yet. For example until recently I don't think Spark could ask YARN for allocations based on number of cores? That is: MapReduce might be easier to understand, manage and tune

You can reproduce almost all of MapReduce's behavior in Spark, since Spark has narrow, simpler functions that can be used to produce a lot of executions. You don't always want to mimic MapReduce.

One thing Spark can't do yet is an out-of-core sort of the sort you happen to get from how classic MapReduce works, but that's coming. I suppose there aren't very direct analogs of a few things like MultipleOutputs either.

Upvotes: 2

Related Questions