Mario
Mario

Reputation: 1851

Apache Pig: Replace null with string

I'm having a lot of null entries in my data. Due to later processing it would be very helpful if I could set a default value for null to be the string "other". I couldn't find a way to do this (version 0.8.1-cdh3u4)

Also, I have some variables in my GENERATE statements that can potentially return null, and I would need something similar to the SQL DECODE function to get the "other" string instead of null.

Example:

tmp = FOREACH dump GENERATE site, REGEX_EXTRACT(name, '^(?:([^.]+)\\.?){1}', 1) AS project, ((ami MATCHES '.*datatype.*') ? REGEX_EXTRACT(name, '^(?:([^.]+)\\.?){5}', 1) : 'other') AS datatype, ami, duid, nbfiles, length, rnbfiles, rlength, name; 

Here: 'site' and 'datatype' could return an empty string (which is valid) and is interpreted as null, but should be "other" instead.

Thanks a lot.

Upvotes: 1

Views: 4142

Answers (1)

Mario
Mario

Reputation: 1851

So the only thing that I could find was the ?: ternary operator. This makes the whole pig script a little bit verbose, but well, it works :-)

(((ami MATCHES '.datatype.') ? REGEX_EXTRACT(name, '^(?:([^.]+)\.?){5}', 1) : 'other') IS NULL ? 'other' : ((ami MATCHES '.datatype.') ? REGEX_EXTRACT(name, '^(?:([^.]+)\.?){5}', 1) : 'other')) AS datatype

Upvotes: 1

Related Questions