abra
abra

Reputation: 191

pandas: get rows by comparing two columns of dataframe to list of tuples

Say I have a pandas DataFrame with four columns: A,B,C,D.

my_df = pd.DataFrame({'A': [0,1,4,9], 'B': [1,7,5,7],'C':[1,1,1,1],'D':[2,2,2,2]})

I also have a list of tuples:

my_tuples = [(0,1),(4,5),(9,9)]

I want to keep only the rows of the dataframe where the value of (my_df['A'],my_df['B']) is equal to one of the tuples in my_tuples.

In this example, this would be row#0 and row#2.

Is there a good way to do this? I'd appreciate any help.

Upvotes: 11

Views: 2056

Answers (2)

ansev
ansev

Reputation: 30930

We can also use DataFrame.loc with map.

my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]

#my_df.loc[[row in my_tuples for row in zip(my_df['A'], my_df['B'])],:]

Time comparison

%%timeit
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
394 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
3.56 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
3.99 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 4

jezrael
jezrael

Reputation: 863146

Use DataFrame.merge with DataFrame created by tuples, there is no on parameter for default interecton of all columns in both DataFrames, here A and B:

df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
   A  B  C  D
0  0  1  1  2
1  4  5  1  2

Or:

df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
   A  B  C  D
0  0  1  1  2
2  4  5  1  2

Upvotes: 10

Related Questions