Mohd Zoubi
Mohd Zoubi

Reputation: 186

How to get this output from pig Latin in MapReduce

I want to get the following output from Pig Latin / Hadoop

((39,50,60,42,15,Bachelor,Male),5)
((40,35,HS-grad,Male),2)
((39,45,15,30,12,7,HS-grad,Female),6)

from the following data sample data sample for adult data

I have written the following Pig Latin script:

sensitive = LOAD '/mdsba/sample2.csv' using PigStorage(',') as (AGE,EDU,SEX,SALARY);
BV= group  sensitive by (EDU,SEX) ; 
BVA= foreach BV generate group as EDU, COUNT (sensitive) as dd:long;
Dump BVA ;

Unfortunately, the results come out like this

((Bachelor,Male),5)
((HS-grad,Male),2)

Upvotes: 0

Views: 136

Answers (1)

kecso
kecso

Reputation: 2485

Than try to project the AGE data too. Something like this:

BVA= foreach BV generate 
    sensitive.AGE as AGE,
    FLATTEN(group) as (EDU,SEX), 
    COUNT(sensitive) as dd:long;

Another suggestion is to specify the datatype when you load the data.

sensitive = LOAD '/mdsba/sample2.csv' using PigStorage(',') as (AGE:int,EDU:chararray,SEX:chararray,SALARY:chararray);

Upvotes: 1

Related Questions