Reputation: 79

How to load bag data in Apache Pig

Here is my sample bag data and my file name is bag.txt:

{(8,9),(0,1)},{(8,9),(1,1)}
{(2,3),(4,5)},{(2,3),(4,5)}
{(6,7),(3,7)},{(2,2),(3,7)}

Now i want to load this data in my Apache Pig shell when I am loading this data using

A = LOAD '/home/mvsubhash/Desktop/bag.txt' AS (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});

But my final result is like this

({(8,9),(0,1)},)
({(2,3),(4,5)},)
({(6,7),(3,7)},)

In the above result second bag is not processing.

Upvotes: 0

Answers (2)

Kunal Wadhwa

Reputation: 49

This data is not in format which Pig can process, as the record separator inside bag is similar to record separator between fields in a row i.e ',' so this can only be done using UDF.

But if we represent the field separator in record as Semicolon ";"

Then we can load the data using:

grunt> Data = load 'bag_data' using PigStorage(';') as (Bag1:bag{tuple1:(A1:int,A2:int)},Bag2:bag{tuple2:(B1:int,B2:int)});
grunt> Dump Data;

Upvotes: 0

salmanbw

Reputation: 1311

By default the delimiter used for reading the file in Pig is tab(\t).Since your records is a comma seprated bags.Try using PigStorage(,)

A = LOAD '/home/mvsubhash/Desktop/bag.txt' USING PigStorage(',') AS (B1:bag{},B2:bag{});

Upvotes: 0

How to load bag data in Apache Pig

Answers (2)

Related Questions