Reputation: 997
I currently do this to delete a row that has a specific column 'some_column'
value that is found in a list removal_list
df = df[~df['some_column'].isin(removal_list)]
How can I do this if I want to compare a combination of values in say a list of tuples ? (doesn't necessarily need to be a list of tuples if there is a better way to achieve this)
for example:
removal_list = [(item1,store1),(item2,store1),(item2,store2)]
if df['column_1']
and df['column_2']
of a specific row have values item1
and store1
(or any other tuple in removal_list
), then delete that row
also, it might be that there are more than two columns that need to be assessed
EDIT better example:
client account_type description
0 1 2 photographer
1 2 2 banker
2 3 3 banker
3 4 2 journalist
4 5 4 journalist
remove_list = [(2,journalist),(3,banker)]
check on columns account_type
and description
Output:
client account_type description
0 1 2 photographer
1 2 2 banker
4 5 4 journalist
Upvotes: 3
Views: 1811
Reputation: 1381
You could use the query method with an extra column to select against.
removal_list = [(item1,store1),(item2,store1),(item2,store2)]
df['removal_column'] = df.apply(lambda x: (x.account_type, x.description), axis='columns')
df = df.query('removal_column not in @removal_list').drop('removal_column', axis='columns')
Upvotes: 0
Reputation: 294508
If the index was set to be ['account_type', 'description']
, we could use the drop
method.
df.set_index(['account_type', 'description']).drop(remove_list).reset_index()
account_type description client
0 2 photographer 1
1 2 banker 2
2 4 journalist 5
Upvotes: 2
Reputation: 164783
One way is to create a series from zipping 2 columns, then use Boolean indexing. I also advise you use set
instead of list
for O(1) lookup.
remove_set = {(2,'journalist'),(3,'banker')}
condition = pd.Series(list(zip(df.account_type, df.description))).isin(remove_set)
res = df[~condition]
print(res)
client account_type description
0 1 2 photographer
1 2 2 banker
4 5 4 journalist
Upvotes: 2
Reputation: 863301
I suggest create DataFrame
and merge
with default inner join:
remove_list = [(2,'journalist'),(3,'banker')]
df1 = pd.DataFrame(remove_list, columns=['account_type','description'])
print (df1)
account_type description
0 2 journalist
1 3 banker
df = df.merge(df1, how='outer', indicator=True).query('_merge != "both"').drop('_merge', 1)
print (df)
client account_type description
0 1 2 photographer
1 2 2 banker
4 5 4 journalist
Upvotes: 2
Reputation: 76366
Say you have
removal_list = [(item1,store1),(item2,store1),(item2,store2)]
Then
df[['column_1', 'column_2']].apply(tuple, axis=1)
should create a Series of tuples, and so
df[['column_1', 'column_2']].apply(tuple, axis=1).isin(removal_list)
is the binary condition you're after. Removal is the same as you did before. This should work for any number of columns.
Example
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df[['a', 'b']].apply(tuple, axis=1).isin([(1, 3), (30, 40)])
0 (1, 3)
1 (2, 4)
dtype: object
Upvotes: 4