Reputation: 1831
Currently my output is as below:
((130,1))
((131,1))
((132,1))
((133,1))
((137,1))
((138,2))
((139,1))
((140,1))
((142,2))
((143,1))
I want to have it like:
130 1
131 1
132 1
My code is given below:
A = LOAD 'user-links-small.txt' AS (user_a: int, user_b: int);
B = ORDER A BY user_a;
grouped = COGROUP B BY user_a;
C = FOREACH grouped GENERATE COUNT(B);
D = COGROUP C BY $0;
E = FOREACH D GENERATE($0, COUNT($1));
DUMP E;
I was looking through these forums, and some suggested that the way to this was by coding a user-defined function. I can try that, but I am new to Pig and want to learn its functions a bit more in details. I found something on flatten() but can't really get that output. Is there a way to remove the brackets and commas as shown? Thanks in advance for any help!
Upvotes: 2
Views: 2033
Reputation: 4724
If you use DUMP command by default the output will be stored as tuples (ie all the fields dumped inside function bracket separated by delimiter ',')
You can remove the first bracket using FLATTEN operator and second bracket and ',' using STORE command.
Try this
E = FOREACH D GENERATE FLATTEN(($0, COUNT($1)));
STORE E INTO 'output' USING PigStorage(' ');
Go to the folder 'output' and check the file name starts with part*. you will see the output like this
130 1
131 1
132 1
Upvotes: 1