Reputation: 6465
I'm coming from a MapReduce background and I'm quite new to Spark. I could not find an article explaining the architectural difference between MapReduce and Spark. My understanding so far is the only difference the MapReduce and Spark have is the notion of 'in-memory' processing. That is, the Spark has mapping/reducing phase and they might run on two different nodes within the cluster. Pairs with the same keys are transferred to the same reducer and there is a shuffling phase involved. Am I correct? or there is some difference in the way mapping and reducing stages are done and...
Upvotes: 0
Views: 2378
Reputation: 66866
I think it's directly on point, so I don't mind pointing you to a blog post I wrote:
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
Spark is a large superset of MapReduce, in the sense that you can express MapReduce with Spark operators, but a lot of other things too. It has a large set of small operations from which you construct pipelines. So there's not a 1:1 mapping, but, you can identify how a lot of MapReduce elements correspond to Spark. Or: MapReduce actually gives you two operations that do a lot more than 'map' and 'reduce', which may not have been obvious so far.
Upvotes: 1