Chenab
Chenab

Reputation: 93

How would I make a pig script that only returns fields with entries over a certain length?

The data I have is already fielded, I just want a document that contains two of the fields and even then it only contains an entry if the title field is over a certain length. This is what I have so far.

records = LOAD '$INPUT' USING PigStorage('\t') AS (url:chararray, title:chararray, meta:chararray, copyright:chararray, aboutUSLink:chararray, aboutTitle:chararray, aboutMeta:chararray, contactUSLink:chararray, contactTitle:chararray, contactMeta:chararray, phones:chararray);
E = FOREACH records IF SIZE(title)>10 GENERATE url,title;
STORE E INTO '$OUTPUT/phoneNumbersAndTitles';

Why does the code exit at IF?

Upvotes: 3

Views: 788

Answers (1)

cabad
cabad

Reputation: 4575

You should use FILTER, which selects tuples from a relation based on some condition:

filtered = FILTER records BY SIZE(title) > 10;
E = FOREACH filtered GENERATE url,title;

Upvotes: 3

Related Questions