Reputation: 85
I have setup a 3 node Hadoop cluster with Cloudera manager CDH4. When ran a Pig job in mapreduce mode it took double the time than that of the local mode for same data set. Is that an expected behavior? Also is there any documentation available for performance tuning options for mapreduce jobs?
Thanks much for any help!
Upvotes: 1
Views: 630
Reputation: 3619
Another reason is when you run in -x local mode, Pig does not do the same jar compilations as it does for map reduce mode. With small data sets and complex pig script the actual jar compilation time becomes noticeable.
Upvotes: 0
Reputation: 926
A good start for performance tuning is the "Making Pig Fly" chapter from the "Programming Pig" book.
Upvotes: 0
Reputation: 25909
This is probably because you are using a toy dataset and the overhead of mapreduce is larger than the benefit of parallelization
Upvotes: 1