Reputation: 89
I'm new to pig and I'm playing around with it and came to a roadblock.
Imagine I have the following:
dump test;
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
(3,null)
(4,null)
I want to filter "test" to remove the null values so I would do something like this:
filter_test = filter test by test.column2 is not null;
To give me something like this:
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
But it returns the same thing. It doesn't remove the null rows.
I'm using Pig 10 and the date column is of type chararray.
Thanks for any help.
Upvotes: 1
Views: 1672
Reputation: 4724
Your column2 doesn't have null value its a chararray. please see the examples for real null value and null as chararray.
Example1: null as chararray
input.txt
1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,null
4,null
Pig:
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;
Output:
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
Example2: Real null value
input.txt
1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,
4,
Pig:
A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;
Output:
(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
Upvotes: 1