Reputation: 203
I am trying to select all items in a pandas dataframe whose label contain a certain string:
df.query(' ITEM in @label.keys() and "H2O" in @label[ITEM]')
label is a dictionary that maps items to their text labels.
However when I do this, I get:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I know that one way to work around this is to create another dictionary with items that have "H2O" in their label, let's call it label_H2O, then the query can become:
df.query(' ITEM in @label_H2O.keys()')
However the first way is more convenient as I can do everything in a single query.
Is it possible to do this in a single query as I was trying to do in my first approach? Also if there is a way, would it be slower than the second approach?
I am not sure how the expression gets evaluated, two possibilities I can think of:
In case 1, it seems both queries should have similar performance. In case 2, the second query is much more efficient especially if we have many repeated items.
df header:
SUBJECT_ID ITEM
0 1 3
1 2 5
2 1 5
3 1 2
4 1 2
label:
{
1: 'Coffee',
2: 'Apple Juice',
3: 'Soda',
4: 'Tea',
5: 'Sparkling H2O'
}
Upvotes: 1
Views: 330
Reputation: 2017
I'm not sure from your question if the intent is to get the result or if you are married to the idea of using query
and just curious how to get it to work.
If you're just looking for a way to get the rows from df
that have a key that corresponds to a label that has 'H2O' in it:
df[df['item'].isin([k for k,v in label.items() if 'H2O' in v])]
Upvotes: 1