Reputation: 165
How to filter dataframe from a set of tuples, so that the pairing is the same? I need a more elegant way of writing. Im trying not to use merge because it will make it less efficient.
So I have a list of tuple called tup_list:
[('118', '35'), ('35', '35'), ('118', '202')
Assuming the first element in each tuple is A, and the second is B, I am trying to filter my dataframe according to this tup_list, where the pairing needs to be the same.
Original dataframe:
A B
118 35
118 40
35 202
118 1
35 35
After filtering according to the tup_list, the new dataframe should be:
A B
118 35
35 35
Only exact pairings should be returned.
Currently Im using df= df.merge(tup_list, on=['A','B'], how='inner'). But is not very efficient as my actual data is larger.
Please advise on more efficient way of writing.
Upvotes: 7
Views: 4640
Reputation: 627
With your tup_list
and dataframe named df
here is a one liner for the requested output:
df[[x in tup_list for x in list(zip(df.A,df.B))]]
Upvotes: 4
Reputation: 14103
use boolean indexing:
tup_list = [(118, 35), (35, 35), (118, 202)]
df[pd.Series(list(zip(df['A'], df['B']))).isin(tup_list)]
A B
0 118 35
4 35 35
list(zip(df['A'], df['B']))
turns your two columns into a list of tuples:
[(118, 35), (118, 40), (35, 202), (118, 1), (35, 35)]
which you are turning into a series and using isin
to return a boolean:
0 True
1 False
2 False
3 False
4 True
dtype: bool
Which can be used in boolean indexing
Upvotes: 8
Reputation: 4137
With pandas.DataFrame.query
you can also filter your dataframe according to your list of tuples
import numpy as np
import pandas as pd
f = [('118', '35'), ('35', '35'), ('118', '202')]
idxs = [df.query('A=='+ t[0] + ' and B==' + t[1]).index.values for t in f]
idxs = np.concatenate(idxs).ravel().tolist()
df2 = df.iloc[idxs,:]
print(df2)
# A B
# 0 118 35
# 4 35 35
Upvotes: 0