Ajn
Ajn

Reputation: 583

Running Pig Jobs remotely

I am learning Pig jobs and want to run pig script on a remote cluster through java code using PigServer. Can anybody guide me how to achieve this? Thanks in advance.

Upvotes: 1

Views: 1222

Answers (2)

Pankaj Khattar
Pankaj Khattar

Reputation: 111

Can the above code be used to do a remote call i.e. Pig is installed on cluster1 & call is made from the application server outside the cluster?

Upvotes: 3

Charles Menguy
Charles Menguy

Reputation: 41428

You have to use the PigServer class to connect to your cluster, register your Pig queries and get results. You can either choose to run a script by passing your filename on your disk, or you can directly write your Pig script lines and pass it as Java strings.

To pass a Pig script from the filename:

PigServer pig = new PigServer(ExecType.MAPREDUCE);
pig.registerScript("/path/to/test.pig");

To pass your Pig program as Strings:

PigServer pig = new PigServer(ExecType.MAPREDUCE);
pig.registerQuery("A = LOAD 'something' USING PigLoader();");

You can get back the results for example this way:

Iterator<Tuple> i = pig.openIterator("A");
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
while (i.hasNext()) {
    Integer val = DataType.toInteger(i.next().get(0));
    map.put(val, val);            
}

Note that you need to have some properties in your classpath, namely fs.default.name and mapred.job.tracker or you can just add them to the PigServer constructor.

Upvotes: 2

Related Questions