Reputation: 79
Here is my sample bag data and my file name is bag.txt:
{(8,9),(0,1)},{(8,9),(1,1)}
{(2,3),(4,5)},{(2,3),(4,5)}
{(6,7),(3,7)},{(2,2),(3,7)}
Now i want to load this data in my Apache Pig shell when I am loading this data using
A = LOAD '/home/mvsubhash/Desktop/bag.txt' AS (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
But my final result is like this
({(8,9),(0,1)},)
({(2,3),(4,5)},)
({(6,7),(3,7)},)
In the above result second bag is not processing.
Upvotes: 0
Views: 2242
Reputation: 49
This data is not in format which Pig can process, as the record separator inside bag is similar to record separator between fields in a row i.e ',' so this can only be done using UDF.
But if we represent the field separator in record as Semicolon ";"
Then we can load the data using:
grunt> Data = load 'bag_data' using PigStorage(';') as (Bag1:bag{tuple1:(A1:int,A2:int)},Bag2:bag{tuple2:(B1:int,B2:int)});
grunt> Dump Data;
Upvotes: 0
Reputation: 1311
By default the delimiter used for reading the file in Pig is tab(\t).Since your records is a comma seprated bags.Try using PigStorage(,)
A = LOAD '/home/mvsubhash/Desktop/bag.txt' USING PigStorage(',') AS (B1:bag{},B2:bag{});
Upvotes: 0