Reputation: 191
Say I have a pandas DataFrame with four columns: A,B,C,D.
my_df = pd.DataFrame({'A': [0,1,4,9], 'B': [1,7,5,7],'C':[1,1,1,1],'D':[2,2,2,2]})
I also have a list of tuples:
my_tuples = [(0,1),(4,5),(9,9)]
I want to keep only the rows of the dataframe where the value of (my_df['A'],my_df['B'])
is equal to one of the tuples in my_tuples.
In this example, this would be row#0 and row#2.
Is there a good way to do this? I'd appreciate any help.
Upvotes: 11
Views: 2056
Reputation: 30930
We can also use DataFrame.loc
with map
.
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
#my_df.loc[[row in my_tuples for row in zip(my_df['A'], my_df['B'])],:]
Time comparison
%%timeit
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
394 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
3.56 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
3.99 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 4
Reputation: 863146
Use DataFrame.merge
with DataFrame
created by tuples, there is no on
parameter for default interecton of all columns in both DataFrames
, here A
and B
:
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
A B C D
0 0 1 1 2
1 4 5 1 2
Or:
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
A B C D
0 0 1 1 2
2 4 5 1 2
Upvotes: 10