devdreamer
devdreamer

Reputation: 179

Reshape data in pig - change row values to column names

Is there a way to reshape the data in pig?

The data looks like this -

id | p1 | count   
1  | "Accessory" | 3    
1  | "clothing" | 2     
2  | "Books" | 1   

I want to reshape the data so that the output would look like this--

id | Accessory | clothing | Books    
1  | 3  |  2 | 0    
2  | 0  |  0 | 1

Can anyone please suggest some way around?

Upvotes: 0

Views: 78

Answers (1)

Murali Rao
Murali Rao

Reputation: 2287

If its a fixed set of product line the below code might help, otherwise you can go for a custom UDF which helps in achieving the objective.

Input : a.csv

1|Accessory|3    
1|Clothing|2     
2|Books|1   

Pig Snippet :

test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
    accessory = FILTER test BY product_name=='Accessory';
    clothing = FILTER test BY product_name=='Clothing';
    books = FILTER test BY product_name=='Books';
    GENERATE group AS product_id, (IsEmpty(accessory)  ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing)  ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books)  ? '0' : BagToString(books.rec_cnt)) AS b_cnt;

};

DUMP req_stats;

Output :DUMP req_stats;

(1,3,2,0)
(2,0,0,1)

Upvotes: 1

Related Questions