Reputation: 380
I have a DataFrame with two columns store_id
and product_store_id
whose values I need to check against a list of tuples
products_list = [('ebay','123'),('amazon','789'),..]
efficiently, and select the rows containing just the rows of products described in that list?
I've tried products.loc[products[['store_id','product_store_id']].isin(products_list)]
but pandas doesn't like that (ValueError: Cannot index with multidimensional key)
How do i efficiently select all of the houses where store_id
and product_store_id
are in the list?
Upvotes: 0
Views: 398
Reputation: 402443
There are ways to do this, some more hacky than others. My recommendation is to generate a MultiIndex
, these work nicely with a list of tuples:
# <=0.23
idx = pd.MultiIndex.from_arrays([
products['store_id'], products['product_store_id']])
# 0.24+
idx = pd.MultiIndex.from_frame(products[['store_id', 'product_store_id']])
products.loc[idx.isin(products_list)]
Another option is concatenation and filtering,
products_list_concat = [''.join(l) for l in products_list]
mask = ((products['store_id'] + products['product_store_id'])
.isin(products_list_concat))
products.loc[mask]
Upvotes: 1