Reputation: 3996
I have a dictionary between tuples to list of values, for example:
{
('book1', 'US'): [1, 5],
('book2', 'CA'): [3]
}
The tuple represents columns values in pandas DataFrame
(there might be other columns as well) and the list represents values in a single column in the same dataframe.
Let's say my dataframe looks something like this:
book country value
0 book1 US 1
1 book1 US 9
2 book1 US 5
3 book2 MX 7
4 book2 CA 3
5 book1 CA 1
I would like to either delete the rows that the dict
above represents or change the value of those rows to 0.
So the result will be either:
book country value
0 book1 US 0
1 book1 US 9
2 book1 US 0
3 book2 MX 7
4 book2 CA 0
5 book1 CA 1
Or:
book country value
0 book1 US 9
1 book2 MX 7
2 book1 CA 1
What is the best approach for doing this?
I want to do this on a pretty large dataframe and it should be as efficient as possible.
My idea was doing something like this, but it doesn't seem very efficient (due to the multiple loc
) and I get duplicates rows instead of removing the rows. (I don't want to use drop_duplicates
as there might be duplicates at first place that I don't want to drop)
data_frame.set_index(['book', 'country'], inplace=True)
for key, values in rows_to_remove.iteritems():
data_frame.loc[key] = data_frame.loc[key][~data_frame.loc[key]['value'].isin(values)]
data_frame.reset_index(inplace=True)
Upvotes: 1
Views: 163
Reputation: 862591
You can create list of tuples and check membership by Index.isin
with boolean indexing
:
d = {
('book1', 'US'): [1, 5],
('book2', 'CA'): [3]
}
tups = [k + (x, ) for k, v in d.items() for x in v]
df = df[~df.set_index(['book','country','value']).index.isin(tups)]
print (df)
book country value
1 book1 US 9
3 book2 MX 7
5 book1 CA 1
For set 0
by condition use loc
:
df.loc[df.set_index(['book','country','value']).index.isin(tups), 'value'] = 0
print (df)
book country value
0 book1 US 0
1 book1 US 9
2 book1 US 0
3 book2 MX 7
4 book2 CA 0
5 book1 CA 1
Another solution:
tups = [k + (x, ) for k, v in d.items() for x in v]
df1 = pd.DataFrame(tups, columns=['book','country','value'])
df2 = pd.concat([df, df1, df1], ignore_index=True)
df = df2[~df2.duplicated(keep=False)]
print (df)
book country value
1 book1 US 9
3 book2 MX 7
5 book1 CA 1
Upvotes: 2