Reputation: 1
How to write a PIG Query to get the count of presence of values in a field?
e.g:
Field A | Field B
20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;
Required O/P : Count of Field A = 6, Count of Field B = 4
Upvotes: 0
Views: 521
Reputation: 4724
Pig doesn't treat the above input as null
its basically a chararray
, so all the built-in functions like(is null, is not null
) will not work in this case. You need to group all the fields, filter out the null values and get the count. Can you try the below script?
input
20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;
PigScript:
A = LOAD 'input' USING PigStorage('|') AS (f1:int,f2:chararray);
B = GROUP A ALL;
C = FOREACH B {
filterNull = FILTER A BY (f2!='null;');
GENERATE COUNT(A.f1) AS fieldA, COUNT(filterNull.f2) AS fieldB;
}
DUMP C;
Output:
(6,4)
Upvotes: 8
Reputation: 998
Input:
20|ABC
21|XYZ
25|null
99|WER
45|null
89|FOY
Script:
inputData = LOAD 'input' using PigStorage('|');
grouped_input = GROUP inputData ALL;
counts = FOREACH grouped_input GENERATE COUNT($1), COUNT($2);
dump counts;
Upvotes: 0
Reputation: 1
please find the answer :- my sample data is
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
011 Banglore
its all separated by space
Code is as below :-
A = load '/edata' using PigStorage(' ') as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);
s = group A ALL ;
result = foreach s generate COUNT(A.eid),COUNT(A.name),COUNT(A.country),COUNT(A.salary);
dump result ;
You will get following result :-
(10,9,9,9)
Upvotes: 0
Reputation: 11
Please find steps to follow to get the output
fieldcount = load '/user/examples/stackoverflow/count.txt' using PigStorage('|') as (a:int, b:chararray);
fieldcount1 = FOREACH fieldcount GENERATE a, REPLACE(b,';','') as b;
fieldcount2 = GROUP fieldcount1 ALL;
fieldcount3 = FOREACH fieldcount2 {
a_cnt = FILTER fieldcount1 BY a is not null;
b_cnt = FILTER fieldcount1 BY b is not null and b != 'null' ;
GENERATE COUNT(a_cnt) as a_count, COUNT(b_cnt) as b_count;
}
Upvotes: 0