Reputation: 583
I am learning Pig jobs and want to run pig script on a remote cluster through java code using PigServer. Can anybody guide me how to achieve this? Thanks in advance.
Upvotes: 1
Views: 1222
Reputation: 111
Can the above code be used to do a remote call i.e. Pig is installed on cluster1 & call is made from the application server outside the cluster?
Upvotes: 3
Reputation: 41428
You have to use the PigServer
class to connect to your cluster, register your Pig queries and get results. You can either choose to run a script by passing your filename on your disk, or you can directly write your Pig script lines and pass it as Java strings.
To pass a Pig script from the filename:
PigServer pig = new PigServer(ExecType.MAPREDUCE);
pig.registerScript("/path/to/test.pig");
To pass your Pig program as Strings:
PigServer pig = new PigServer(ExecType.MAPREDUCE);
pig.registerQuery("A = LOAD 'something' USING PigLoader();");
You can get back the results for example this way:
Iterator<Tuple> i = pig.openIterator("A");
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
while (i.hasNext()) {
Integer val = DataType.toInteger(i.next().get(0));
map.put(val, val);
}
Note that you need to have some properties in your classpath, namely fs.default.name
and mapred.job.tracker
or you can just add them to the PigServer
constructor.
Upvotes: 2