udi
udi

Reputation: 508

Difference between Threading and Map-Reduce processing?

One of my collegue is arguing with me for introducing map-reduce concept in our application(text processing). His opinion is why we should not use threading concepts instead.We both are new to this map-reduce paradigm. I thought that using map-reduce concept helps the developer from the overhead of handling thread synchronisation,dead lock,shared data. Is there anything other than this for going to map-reduce concept rather than threading?

Upvotes: 5

Views: 5161

Answers (2)

user123
user123

Reputation: 5407

You can find related paper for this, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

Threading offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

Upvotes: 4

Judge Mental
Judge Mental

Reputation: 5239

Map-reduce adds tons of overhead, but can work to coordinate a large fleet of machines for an "embarrassingly parallel" use case. Threading is only worth it if you have multiple cores and only a single host, but there are many frameworks which add layers of abstraction above raw threads (e.g. Concurrent, Akka) that are easier in general to work with.

Upvotes: 2

Related Questions