user2922308
user2922308

Reputation: 11

Convert data in a specific format in Apache Pig.

I want to convert data in to a specific format in Apache Pig so that I can use a reporting tool on top of it.

For example:

10:00,abc
10:00,cde
10:01,abc
10:01,abc
10:02,def
10:03,efg

The output should be in the following format:

        abc   cde  def  efg 
10:00   1      1    0    0
10:01   2      0    0    0
10:02   0      0    1    0

The main problem here is that a value can occur multiple times in a row, depending on the different values available in the sample csv file, up to a total of 120.

Any suggestions to tackle this are more than welcome.

Thanks Gagan

Upvotes: 0

Views: 138

Answers (1)

Davis Broda
Davis Broda

Reputation: 4125

Try something like the following:

A = load 'data' using PigStorage(",") as (key:chararray,value:chararray);

B = foreach A generate key,(value=='abc'?1:0) as abc,(value=='cde'?1:0) as cde,(value=='efg'?1:0) as efg;

C = group B by key;

D = foreach C generate group as key, COUNT(abc) as abc, COUNT(cde) as cde, COUNT(efg) as efg;

That should get you a count of the occurances of a particular value for a particular key.

EDIT: just noticed the limit 120 part of the question. If you cannot go above 120 put the following code

E = foreach D generate key,(abc>120?"OVER 120":abc) as abc,(cde>120?"OVER 120":cde) as cde,(efg>120?"OVER 120":efg) as efg;

Upvotes: 1

Related Questions