Hadoop / AWS elastic map reduce performance

Question

I am looking for a ballpark if any one has experience with this...

Does anyone have benchmarks on the speed of AWS's map reduce?

Lets say I have 100 million records and I am using hadoop streaming (a php script) to map, group, and reduce (with some simple php calculations). The average group will contain 1-6 records.

Also is it better/more cost effective to run a bunch of small instances or larger ones? I realize it is broken up into nodes within an instance but regardless will larger nodes have a higher I/O so that means faster per node per sever (and more cost efficient)?

Also with streaming how is the ratio of mappers vs reducers determined?

Hadoop / AWS elastic map reduce performance

Answers (1)

Related Questions