user2816456
user2816456

Reputation: 597

Hadoop: Create empty bag if field is null in Pig

I have an example of the following data

id : long,
list: {(itemId: Long, itemName: charArray)}

In my data, list can either be a bag of tuples or null. I would like to change the null into an empty bag (consisting of 0 elements)

I tried something like :

answer = FOREACH data
 GENERATE (list is null ? {} : list) AS list;

It says that {} and list are not compatible schemas. I am wondering how I can create an empty bag with a compatible schema

I ended up doing this and it worked:

answer = FOREACH data
GENERATE (list is null ? (bag{tuple(long,chararray)}){} : list) AS list:{(itemId: long, itemName: charArray)};

Upvotes: 3

Views: 6996

Answers (3)

edrabc
edrabc

Reputation: 1279

If you have the chance to use it, take a look at Apache DataFu project: http://datafu.incubator.apache.org

It has lots of useful stuff for Pig, including datafu.pig.bags.NullToEmptyBag(), which does exactly what you are looking for:

DEFINE NullToEmptyBag datafu.pig.bags.NullToEmptyBag();
...
answer = FOREACH data GENERATE NullToEmptyBag(list) AS list...;

Upvotes: 3

Dheeraj R
Dheeraj R

Reputation: 711

empty tuple should be ()

empty bag should be {}

empty map should be []

Upvotes: 0

Donald Miner
Donald Miner

Reputation: 39893

{} has no types as is. Bags always have a tuple type inside of it. list and your empty bag need to have the same type.

I unfortunately don't have Pig up in a way that I can test this for you and I'm not sure exactly how to do it, but it's going to be something along the lines of this... I couldn't find good documentation on how to set the type of a bag...

Try this perhaps?

answer = FOREACH data
GENERATE (list is null ? (bag{tuple(long,chararray)}){} : list) AS list;

Upvotes: 5

Related Questions