Diffence between Pig on local mode vs pig-withouthadoop.jar

Question

I wanted to know that what is the performance gain or loss if I use pig in local mode (which internally calls Map reduce) vs using PIG-withouthadoop.jar file.?

Does PIG-withouthadoop.jar really does not use hadoop ???

And If I only want to use Pig without clusters, like design a data flow, then what should I use,? Pig in local mode OR pig-withouthadoop.jar file??

Currently I have written my script using pig local mode and while trying to deploy in server and set up PIG in local mode, I think I also need HADOOP_HOME to be set in the environment variables before setting the PIG_HOME variable

Kindly advice ..

Thanks in advance. :)

hello_abhishek · Accepted Answer

Let me answer your question in a sequence:

1) When we talk about performance, then if we assume the file size and the Pig script to be constant, while running in local mode and Hadoop mode. Then, definitely the processing will be faster in local mode as all the task is getting performed in a single JVM and but in case of Hadoop mode, the input file will be carried to the data nodes, then the Pig script or UDFs will also get carried to the cluster. This will demand more time, although, in both the cases the pig scripts and UDFs will internally get converted to map and reduce task and also the number of map and reduce class constructed will always be same in both the cases. We can check this by using EXPLAIN command.

2) No. Pig internally contains a bundle of Hadoop jars. So, if you haven't started the Hadoop by using start-all.sh command, pig will work as it uses the internal Hadoop bundled jars. Now, the interesting part is, if you have installed hadoop and then use pig without starting the Hadoop, then sometimes it will not work because the of Hadoop version mismatch. So to be in safe side start Hadoop explicitly. So, Pig always uses Hadoop. :)

3) Always use Hadoop local mode if the file size is less. As already explained, Pig by default comes with Hadoop jars.

4) Yes you need to set this, if you are using Hadoop explicitly.

Diffence between Pig on local mode vs pig-withouthadoop.jar

Answers (2)

Related Questions