Ibrahim
Ibrahim

Reputation: 203

Using MySQL like conditions when querying pandas dataframe column values

I am trying to select all items in a pandas dataframe whose label contain a certain string:

df.query(' ITEM in @label.keys() and "H2O" in @label[ITEM]')

label is a dictionary that maps items to their text labels.

However when I do this, I get:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I know that one way to work around this is to create another dictionary with items that have "H2O" in their label, let's call it label_H2O, then the query can become:

df.query(' ITEM in @label_H2O.keys()')

However the first way is more convenient as I can do everything in a single query.

Is it possible to do this in a single query as I was trying to do in my first approach? Also if there is a way, would it be slower than the second approach?

I am not sure how the expression gets evaluated, two possibilities I can think of:

  1. Pass through the whole dataframe and get all unique items -> filter them based on the additional conditions -> store them in a set -> pass through each row in the dataframe and check if item is in this set and select the row accordingly
  2. Pass through each row and evaluate the conditions

In case 1, it seems both queries should have similar performance. In case 2, the second query is much more efficient especially if we have many repeated items.

df header:

   SUBJECT_ID  ITEM
0           1     3
1           2     5
2           1     5
3           1     2
4           1     2

label:

{
 1: 'Coffee',
 2: 'Apple Juice',
 3: 'Soda',
 4: 'Tea',
 5: 'Sparkling H2O'
}

Upvotes: 1

Views: 330

Answers (1)

it's-yer-boy-chet
it's-yer-boy-chet

Reputation: 2017

I'm not sure from your question if the intent is to get the result or if you are married to the idea of using query and just curious how to get it to work.

If you're just looking for a way to get the rows from df that have a key that corresponds to a label that has 'H2O' in it:

df[df['item'].isin([k for k,v in label.items() if 'H2O' in v])]

Upvotes: 1

Related Questions