Combine and flatten many key/value tuples into a single tuple in pig

Question

I am using Pig 0.8.1. I am somewhat new to Pig but I know there must be a reasonable and re-usable solution for how I want to work with my tuples. I have the following format (similar to triples):

Schema: (uuid, key, val)

Data:
(id1, 'name', 'Corey')
(id1, 'location', 'USA')
(id1, 'carsOwned', 5)
(id2, 'name', 'Paul')
(id2, 'location', 'CANADA')
(id2, 'carsOwned', 10)

The reason I'm representing this data in triples is because it's possible to have multi-valued keys, so pushing the data into a map is out of the question.

What I need to be able to do is find the ids, names and locations of the people with the top 10 cars owned. I'd like it if my output format could be this when sorted in descending order:

Schema: (uuid, name, location, carsOwned)

Data:
(id2, 'Paul', 'CANADA', 10)
(id1, 'Corey', 'USA', 5)

I have tried filtering my input into 3 different aliases (one where key == 'name', one where key == 'location' and one where key == 'carsOwned') so that I can use JOIN and bring them back into one tuple, but it appears that Pig ends up loading from the inputFormat 3 times instead of one. Maybe I'm doing that wrong?

I also tried grouping by id but then I can't seem to find a reasonable way to work with the bag of the actual triple key/values since they all have the exact same schema.

What I really want is to group by the id field and then flatten each of the keys but rename the alias to the actual name of the key.

Any ideas? Thanks in advance!

Combine and flatten many key/value tuples into a single tuple in pig

Answers (1)

Related Questions