Reputation:
I understand that MapReduce is great for solving parallel problems on a huge data set. However, are there any examples of problems that while in some sense parallellizable, are not a good fit for MapReduce?
Upvotes: 0
Views: 368
Reputation: 414
Few observations:
We shouldn’t be confusing Hadoop and early Google implementation of MapReduce that Hadoop copied (i.e. limited to key/value mapping only) with general split & aggregate concept that MapReduce is based on
MapReduce idea (split & aggregate, divide & concur are just few other names for it) is about parallelization of processing through splitting into smaller sub-tasks that can be processed independently parallel - and as such can be applied to a wide verity of problems (data intensive, compute intensive or otherwise)
MapReduce, in general, has nothing to do with big data sets, or data at all. It is successfully used for small data sets or in computational MapReduce where it is employed for pure processing parallelization
To answer your question the MapReduce doesn’t work generally in cases where the original task cannot be split into set of sub-tasks that can be processed independently in parallel. In real life - very few use cases fall into this category as most non-obvious problems can be approximated for MapReduce type of processing.
Upvotes: 1
Reputation:
Yes and no. It really depends on how they are structured and written. There are certainly problems in which map reduce will parallelize poorly in a given data step/ map-reduce function. Simultaneous equation solvers for symmetric matrices are one example. They do not parallelize well, for the obvious reason of simultaneity, if written in one single function (in many cases they may load onto a single-node). A common work around to this is to isolate the pre-matrix calculations in a separate processor, as they are trivially parallelizable. By breaking this up, the map-reduce optimizer is able to pick-up more nodes, processing power, than it would otherwise.
Upvotes: 0