Matthew Moisen
Matthew Moisen

Reputation: 18279

PigStorage and Variable Schemas from Input

I have a comma separated text file like

1,abc,1,
2,def,1,2,3,4
3,ghi,1,2
4,jkl,1,5,6,7,8,9
5,mno

The text file will always have the first two values, but will have 0 or more values after the second comma.

How can I load this data and give an alias to the first two values?

I can load it and not give an alias to the first two values via:

A = LOAD 'data.txt' USING PigStorage(',');

From here, I can do a B = FOREACH A GENERATE $0 AS foo:chararray, $1 AS bar:chararray; but it would discard the rest. It would be nice to do a wildcard and put the rest in a tuple.

Is there anyway to do this?

Upvotes: 0

Views: 181

Answers (3)

Tanveer
Tanveer

Reputation: 900

Or you can create a Map for reamining fields.

Upvotes: 0

IronManZ
IronManZ

Reputation: 76

Try this

B = foreach A generate $0 as foo:chararray, $1 as bar:chararray, $2..;

reference

Drop single column in Pig

Upvotes: 3

Othman
Othman

Reputation: 3018

I am not sure about what you need.

Try this one

A = LOAD 'data.txt' USING PigStorage(',') AS (foo:chararray, bar:chararray);

This will ignore the other values after the second comma in your file.

Upvotes: 0

Related Questions