Luis Martins
Luis Martins

Reputation: 89

Pig 10 filter chararray by is not null doesn't work

I'm new to pig and I'm playing around with it and came to a roadblock.

Imagine I have the following:

dump test;

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
(3,null)
(4,null)

I want to filter "test" to remove the null values so I would do something like this:

filter_test = filter test by test.column2 is not null;

To give me something like this:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

But it returns the same thing. It doesn't remove the null rows.

I'm using Pig 10 and the date column is of type chararray.

Thanks for any help.

Upvotes: 1

Views: 1672

Answers (1)

Sivasakthi Jayaraman
Sivasakthi Jayaraman

Reputation: 4724

Your column2 doesn't have null value its a chararray. please see the examples for real null value and null as chararray.

Example1: null as chararray
input.txt

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,null
4,null

Pig:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;

Output:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

Example2: Real null value
input.txt

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,
4,

Pig:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;

Output:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

Upvotes: 1

Related Questions