Reputation: 11
I have table which i read via following.
A = load 'customer' using PigStorage('|');
In customer following are some of the row
7|Ron|[email protected]
8|Rina
9|Don|[email protected]
9|Don|[email protected]
10|Maya|[email protected]
11|marry|[email protected]
When i use following....
B = DISTINCT A;
A_CLEAN = FILTER B by ($0 is not null) AND ($1 is not null) AND ($2 is not null);
it removes 8|Rina as well
How to remove null rows via Pig?
Is there a way i can try A_CLEAN = filter B by not IsNULL()???
I am new to pig so not sure what i shuld put inside IsNULL...
Thanks
A_CLEAN = filter B by not IsEmpty(B);
Upvotes: 1
Views: 7274
Reputation: 4724
Tarun, instead AND condition why can't you put OR condition.
A_CLEAN = FILTER B by ($0 is not null) OR ($1 is not null) OR ($2 is not null);
This will remove all the null rows and retain if any columns is not empty.
Can you try and let me know if this works for your all conditions?
UPDATE:
I don't know why IsEmpty() is not working for you, its working for me.
IsEmpty will work only with bag, so i converted all the fields to bag and test the emptiness. see the below working code.
input.txt
7|Ron|[email protected]
8|Rina
9|Don|[email protected]
9|Don|[email protected]
10|Maya|[email protected]
11|marry|[email protected]
PigSCript:
A = LOAD 'input.txt' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT IsEmpty(TOBAG($0..));
DUMP A_CLEAN;
Output:
(8,Rina )
(7,Ron,[email protected])
(9,Don,[email protected])
(10,Maya,[email protected])
(11,marry,[email protected])
For your another question, its a simple math calculation
In case of AND,
8|Rina
will be treated as
($0 is not null) AND ($1 is not null) AND ($2 is not null)
(true) AND (true) AND (false)
(false) -->so this record will be skipped by Filter command
In case of OR,
8|Rina
will be treated as
($0 is not null) OR ($1 is not null) OR ($2 is not null)
(true) OR (true) OR (false)
(true) -->so this record will be included into the relation by Filter command
In case of empty record,
<empty record>
will be treated as
($0 is not null) OR ($1 is not null) OR ($2 is not null)
(false) OR (false) OR (false)
(false) -->so this record will be skipped by Filter command
Upvotes: 0
Reputation: 106
Try the following:
A = LOAD 'customer' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT(($0 IS NULL) AND ($1 IS NULL) AND ($2 IS NULL));
DUMP A_CLEAN;
This will produce the output:
(8,Rina )
(7,Ron,[email protected])
(9,Don,[email protected])
(10,Maya,[email protected])
(11,marry,[email protected])
In PIG, you cannot test a tuple for emptiness.
Upvotes: 2