Bhagwant
Bhagwant

Reputation: 161

Running Pig script in multiple node

I have configured a Hadoop cluster with three Nodes. All nodes are working fine and connected.

I have uploaded 28 GB file in HDFS and executing Pig script for process that file. While I am executing Script. Its running in single node only.

Could you please give me advice and explain why it is running in single node only? Am I missing something in configuration?

I am using Hadoop 2.2.0 and Pig 0.12 version.

Upvotes: 1

Views: 850

Answers (1)

Jakub Kotowski
Jakub Kotowski

Reputation: 7571

Did you try to set parallel in your script? You have three nodes so you can try to set up to parallel 3. It makes sense to use it with any of the following operators:

  • group
  • cogroup
  • join
  • limit
  • order
  • distinct

Example of the syntax: group x by y parallel 3;

What format is your file? Make sure it is splittable.

Also check that your cluster is working and set up correctly. For example check that task trackers (resp. NodeManager in YARN) are not failing, make sure that slaves and master files are set correctly on all nodes (slaves lists all the slave nodes, master lists the master).

Upvotes: 1

Related Questions