shanks_roux
shanks_roux

Reputation: 438

Apache Pig - Is it possible to serialize a variable?

Let's take the wordCount example:

input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);

-- Extract words from each line and put them into a pig bag
-- datatype, then flatten the bag to get one word on each row
bag_words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;

Is it possible to serialize the "bag_words" variable so that we don't have to rebuild the entire bag each time we want to execute the script ?

Thanks.

Upvotes: 0

Views: 244

Answers (2)

SubSevn
SubSevn

Reputation: 1028

STORE bag_words INTO 'some-output-directory';

Then read it in later to skip the foreach generate, flatten, tokenize.

Upvotes: 2

davek
davek

Reputation: 22905

You can output any alias in pig using the STORE command: you could use standard formats (like CSV) or write your own PigLoader class to implement any specific behaviour. You can then LOAD this output in a separate script, thus bypassing the initial LOAD.

Upvotes: 0

Related Questions