Reputation: 91
I have problem, sum 2 log files.
example files:
file-1
id user view
1 AAA 2
2 BBB 5
3 CCC 9
file-2
id user view address
1 AAA 5 XXX
2 BBB 2 YYY
6 FFF 4 ZZZ
i want sum two file by id and sum (view), i hope output:
output:
id user view address
1 AAA 7 XXX
2 BBB 7 YYY
i should try code join two files, but i don't sum two files:
My code:
inputdata = LOAD '/user/hdfs/tes/part-1' AS (
id:chararray,
user:chararray,
view:int
);
inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
id:chararray,
user:chararray,
view:int,
address:chararray
);
joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id;
outputlist = FOREACH joined {
GENERATE
inputdata::id,
inputdata::user,
--sum(inputdata2::view),
inputdata2::address;
}
dump outputlist;
iam question, how to sum view in two log files.??
Thanks.
Upvotes: 0
Views: 64
Reputation: 2221
Get the join result n a foreach loop and sum up the view values.This will work.
A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);
C = JOIN A by a,B by a;
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address;
Output:
(1,AAA,7,XXX)
(2,BBB,7,YYY)
Upvotes: 2