sigpwned
sigpwned

Reputation: 7413

Creating single-column tuples in Pig?

I'm trying to use a FOREACH .. GENERATE statement to generate a relation whose sole value is a single-column tuple. For illustration, I'm trying to do the following:

x = LOAD 'data.json' USING JsonLoader('a:chararray, b:chararray') AS (a:chararray, b:chararray);
y = foreach x generate (a) as value: (a: chararray);

However, this code yields the following error:

Incompatable field schema: declared is "value:tuple(a:chararray)", infered is "a:chararray"

Wrapping (a) in more parentheses makes no difference. Using tuple(a) is a syntax error, since the tuple syntax is only valid in the context of types.

Modifying the code slightly works, however:

y = foreach x generate (a, 0) as value: (a: chararray, b: int);

This suggests that syntactically there is no way to create a single-valued tuple in Pig. This is a real shame -- it's a very useful pattern.

Is there a way to create a single-column tuple in Pig that I'm missing?

Upvotes: 0

Views: 437

Answers (1)

Suraj Nayak
Suraj Nayak

Reputation: 907

The way of generating a is not correct. It should be as below :

x = LOAD 'data.json' USING JsonLoader('a:chararray, b:chararray') AS (a:chararray, b:chararray);
y = foreach x generate (a) as value:chararray;
dump y;

If you want to generate tuple you can use TOTUPLE built-in UDF

y = foreach x generate TOTUPLE(a) as value:(a:chararray);

The problem in your code was, you had chararray and trying to cast it as tuple. LHS data type should be compatible/equal to RHS to perform data typecasting.

Upvotes: 2

Related Questions