yoni keren
yoni keren

Reputation: 380

How to efficiently select rows based on multiple column values present in a list (of tuples)?

I have a DataFrame with two columns store_id and product_store_id whose values I need to check against a list of tuples

products_list = [('ebay','123'),('amazon','789'),..] 

efficiently, and select the rows containing just the rows of products described in that list?

I've tried products.loc[products[['store_id','product_store_id']].isin(products_list)] but pandas doesn't like that (ValueError: Cannot index with multidimensional key)

How do i efficiently select all of the houses where store_id and product_store_id are in the list?

Upvotes: 0

Views: 398

Answers (1)

cs95
cs95

Reputation: 402443

There are ways to do this, some more hacky than others. My recommendation is to generate a MultiIndex, these work nicely with a list of tuples:

# <=0.23
idx = pd.MultiIndex.from_arrays([
    products['store_id'], products['product_store_id']])
# 0.24+
idx = pd.MultiIndex.from_frame(products[['store_id', 'product_store_id']])

products.loc[idx.isin(products_list)]

Another option is concatenation and filtering,

products_list_concat = [''.join(l) for l in products_list]
mask = ((products['store_id'] + products['product_store_id'])
           .isin(products_list_concat))

products.loc[mask]

Upvotes: 1

Related Questions