Reputation: 4208

Apache Pig: Easier way to filter by a bunch of values from the same field

Say I want select a subset of data by their values of from the same field. Right now I have to do something like this

TestLocationsResults = FILTER SalesData by (StoreId =='17'
or StoreId =='85'
or StoreId =='12'
or StoreId =='45'
or StoreId =='26'
or StoreId =='75'
or StoreId =='13'
)

in SQL, we can simply do this :

SELECT * FROM SalesData where StoreID IN (17, 12, 85, 45, 26, 75, 13)

Is there a similiar shortcut in Pig that I am missing?

Upvotes: 1

Answers (5)

Tim Hopper

Reputation: 925

It looks like Pig 0.12 added an IN operator.

So you can do

FILTER SalesData BY StoreID IN (17, 12, 85, 45, 26, 75, 13);

Upvotes: 3

Eli

Reputation: 39009

The way you're currently doing it is the best way to do it in Pig. All of the alternatives to what you're doing now are either hacky, slow, or both. Hopefully, Pig adds an "in" query in a future version, but for now you're doing it the best way available.

Upvotes: 1

alexeipab

Reputation: 3619

One work around could be to use a built in function "INDEXOF" Ex:

TestLocationsResults = FILTER SalesData by INDEXOF(',17,12,85,45,26,75,13,', CONCAT(CONCAT(',', StoreId), ',')) > -1;

Amended to take into account the comment, introduce the ',' symbols around StoreId to have the exact match and not a partial

Upvotes: 1

reo katoa

Reputation: 5811

My solution to this, when the data type is chararray, is to use a regular expression:

TestLocationsResults = FILTER SalesData by StoreID MATCHES '(17|12|85|45|26|75|13)';

When the data type is an int, you could try casting to a chararray.

Upvotes: 1

Chris White

Reputation: 30089

There is no IN keyword in Pig to do this sort of set membership detection.

One suggestion if to write a UDF (as seen in this question / answer).

Another could be to create a relationship with values for each StoreId you want to filter by and then perform an inner join on the two relationships.

Upvotes: 1

Apache Pig: Easier way to filter by a bunch of values from the same field

Answers (5)

Related Questions