Mahesh
Mahesh

Reputation: 190

pig filter statement eliminating null records automatically

Not sure why pig latin is automatically eliminating the null records without programmer's intension while using FILTER statement on a particular field in a dataset.Any explanation is much more appreciated.

Upvotes: 0

Views: 85

Answers (1)

madbitloman
madbitloman

Reputation: 826

Pig omits nulls in general, making it a bit painful to work with a corrupted data.

Pig produces a warning for the invalid field(null), but does not halt its processing

Says in Hadoop-The Definitive Guide by Tom White.

The approach to deal with such issues is either replace the missing values by some code like 999 or split the data by good and bad quality and take a look on what is going on.

We in general do the data quality check by counting missing values on various steps of the pipeline data aggregation.

Upvotes: 1

Related Questions