plzdontkillme
plzdontkillme

Reputation: 1527

PIG output difference on grunt and through java

When I run my pig script on grunt, output looks good. Below is the example

2013-07-08 16:58:40,640 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-07-08 16:58:40,647 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-07-08 16:58:40,647 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
((email,[email protected]),{(rrr24,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr10,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr20,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr23,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr9,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr8,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr22,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr21,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{})})
((email,[email protected]),{(rrr0,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr6,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr7,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr3,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr1,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr5,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr4,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{}),(rrr2,(full_name,rachana),email,(state,ca),(birth_year,2013),(gender,female),{})})
grunt> 

I can see the full_name, email, birth_year, gender but when I run same using java

package com.chegg.hwh.tracking.dao;

import org.apache.pig.ExecType;
import org.apache.pig.PigServer;

public class HWHDataPigMapReduce {

    public static void main(String args[]) throws Exception {
        PigServer pigServer = new PigServer(ExecType.LOCAL);

        pigServer.registerQuery("rows = LOAD 'cassandra://hwh_tracking/users' USING org.apache.cassandra.hadoop.pig.CassandraStorage();");
        pigServer.registerQuery("emailgroup = group rows by email;");
        pigServer.dumpSchema("emailgroup");

    }

}

Output :

emailgroup: {group: (name: chararray,value: chararray),rows: {(key: chararray,full_name: (name: chararray,value: chararray),email: (name: chararray,value: chararray),state: (name: chararray,value: chararray),birth_year: (name: chararray,value: long),gender: (name: chararray,value: chararray),columns: {(name: chararray,value: bytearray)})}}

I tried using as (full_name:chararray) but no difference. What am I missing here. Can annyone help?

Upvotes: 0

Views: 105

Answers (1)

Frederic
Frederic

Reputation: 3284

In the Java code you're calling dumpSchema(String alias), which is similar to calling DESCRIBE in grunt. This is why the output is different.

You could store the result of the query as follows: pigServer.store("emailgroup", "out");

Try also getExamples(), I have never used it, though.

http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/PigServer.html

Upvotes: 1

Related Questions