AnarKi
AnarKi

Reputation: 997

Deleting dataFrame row in Pandas if a combination of column values equals a tuple in a list

I currently do this to delete a row that has a specific column 'some_column' value that is found in a list removal_list

df = df[~df['some_column'].isin(removal_list)]

How can I do this if I want to compare a combination of values in say a list of tuples ? (doesn't necessarily need to be a list of tuples if there is a better way to achieve this)

for example:

removal_list = [(item1,store1),(item2,store1),(item2,store2)]

if df['column_1'] and df['column_2'] of a specific row have values item1 and store1 (or any other tuple in removal_list), then delete that row

also, it might be that there are more than two columns that need to be assessed

EDIT better example:

client  account_type    description
0   1   2   photographer
1   2   2   banker
2   3   3   banker
3   4   2   journalist
4   5   4   journalist

remove_list = [(2,journalist),(3,banker)]

check on columns account_type and description

Output:

client  account_type    description
0   1   2   photographer
1   2   2   banker
4   5   4   journalist

Upvotes: 3

Views: 1811

Answers (5)

Alexis Lucattini
Alexis Lucattini

Reputation: 1381

You could use the query method with an extra column to select against.

removal_list = [(item1,store1),(item2,store1),(item2,store2)]

df['removal_column'] = df.apply(lambda x: (x.account_type, x.description), axis='columns')
df = df.query('removal_column not in @removal_list').drop('removal_column', axis='columns')

Upvotes: 0

piRSquared
piRSquared

Reputation: 294508

If the index was set to be ['account_type', 'description'], we could use the drop method.

df.set_index(['account_type', 'description']).drop(remove_list).reset_index()

   account_type   description  client
0             2  photographer       1
1             2        banker       2
2             4    journalist       5

Upvotes: 2

jpp
jpp

Reputation: 164783

One way is to create a series from zipping 2 columns, then use Boolean indexing. I also advise you use set instead of list for O(1) lookup.

remove_set = {(2,'journalist'),(3,'banker')}

condition = pd.Series(list(zip(df.account_type, df.description))).isin(remove_set)

res = df[~condition]

print(res)

   client  account_type   description
0       1             2  photographer
1       2             2        banker
4       5             4    journalist

Upvotes: 2

jezrael
jezrael

Reputation: 863301

I suggest create DataFrame and merge with default inner join:

remove_list = [(2,'journalist'),(3,'banker')]

df1 = pd.DataFrame(remove_list, columns=['account_type','description'])
print (df1)
   account_type description
0             2  journalist
1             3      banker

df = df.merge(df1, how='outer', indicator=True).query('_merge != "both"').drop('_merge', 1)
print (df)
   client  account_type   description
0       1             2  photographer
1       2             2        banker
4       5             4    journalist

Upvotes: 2

Ami Tavory
Ami Tavory

Reputation: 76366

Say you have

removal_list = [(item1,store1),(item2,store1),(item2,store2)]

Then

df[['column_1', 'column_2']].apply(tuple, axis=1)

should create a Series of tuples, and so

df[['column_1', 'column_2']].apply(tuple, axis=1).isin(removal_list)

is the binary condition you're after. Removal is the same as you did before. This should work for any number of columns.

Example

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df[['a', 'b']].apply(tuple, axis=1).isin([(1, 3), (30, 40)])
0    (1, 3)
1    (2, 4)
dtype: object

Upvotes: 4

Related Questions