Reputation: 7581
I want to drop specific rows from a pandas dataframe. Usually you can do that using something like
df[df['some_column'] != 1234]
What df['some_column'] != 1234
does is creating an indexing array that is indexing the new df, thus letting only rows with value True
to be present.
But in some cases, like mine, I don't see how I can express the condition in such a way, and iterating over pandas rows is way too slow to be considered a viable option.
To be more specific, I want to drop all rows where the value of a column is also a key in a dictionary, in a similar manner with the example above.
In a perfect world I would consider something like
df[df['some_column'] not in my_dict.keys()]
Which is obviously not working. Any suggestions?
Upvotes: 5
Views: 416
Reputation: 535
What you're looking for is isin()
import pandas as pd
df = pd.DataFrame([[1, 2], [1, 3], [4, 6],[5,7],[8,9]], columns=['A', 'B'])
In[9]: df
Out[9]: df
A B
0 1 2
1 1 3
2 4 6
3 5 7
4 8 9
mydict = {1:'A',8:'B'}
df[df['A'].isin(mydict.keys())]
Out[11]:
A B
0 1 2
1 1 3
4 8 9
Upvotes: 2
Reputation: 859
You can use the function isin()
to select rows whose column value is in an iterable.
my_list = ['my', 'own', 'data']
df.loc[df['column'].isin (my_list)]
my_dict = {'key1':'Some value'}
df.loc[df['column'].isin (my_dict.keys())]
Upvotes: 1
Reputation: 29711
You could use query
for this purpose:
df.query('some_column != list(my_dict.keys()')
Upvotes: 1