user1252954
user1252954

Reputation:

MapReduce for all parallel problems?

I understand that MapReduce is great for solving parallel problems on a huge data set. However, are there any examples of problems that while in some sense parallellizable, are not a good fit for MapReduce?

Upvotes: 0

Views: 368

Answers (2)

Nikita Ivanov
Nikita Ivanov

Reputation: 414

Few observations:

  • We shouldn’t be confusing Hadoop and early Google implementation of MapReduce that Hadoop copied (i.e. limited to key/value mapping only) with general split & aggregate concept that MapReduce is based on

  • MapReduce idea (split & aggregate, divide & concur are just few other names for it) is about parallelization of processing through splitting into smaller sub-tasks that can be processed independently parallel - and as such can be applied to a wide verity of problems (data intensive, compute intensive or otherwise)

  • MapReduce, in general, has nothing to do with big data sets, or data at all. It is successfully used for small data sets or in computational MapReduce where it is employed for pure processing parallelization

  • To answer your question the MapReduce doesn’t work generally in cases where the original task cannot be split into set of sub-tasks that can be processed independently in parallel. In real life - very few use cases fall into this category as most non-obvious problems can be approximated for MapReduce type of processing.

Upvotes: 1

user1639464
user1639464

Reputation:

Yes and no. It really depends on how they are structured and written. There are certainly problems in which map reduce will parallelize poorly in a given data step/ map-reduce function. Simultaneous equation solvers for symmetric matrices are one example. They do not parallelize well, for the obvious reason of simultaneity, if written in one single function (in many cases they may load onto a single-node). A common work around to this is to isolate the pre-matrix calculations in a separate processor, as they are trivially parallelizable. By breaking this up, the map-reduce optimizer is able to pick-up more nodes, processing power, than it would otherwise.

Upvotes: 0

Related Questions