Reputation: 79
I have a DataFrame like this:
data = {'Name':['Tom', 'Jack', 'nick', 'juli', 'Tom', 'nick', 'juli','nick', 'juli','Tom'], 'subject': ['eng', 'maths', 'geo', 'maths', 'science', 'geo', 'maths', 'maths', 'geo', 'science'], 'marks':[99, 98, 95, 90, 99, 98, 97, 95, 96, 98]}
df1 = pd.DataFrame(data)
df1
Name subject marks
0 Tom eng 99
1 Jack maths 98
2 nick geo 95
3 juli maths 90
4 Tom science 99
5 nick geo 98
6 juli maths 97
7 nick maths 95
8 juli geo 96
9 Tom science 98
another dataframe as :
data2 = {'Name':['Jack', 'nick', 'Tom', 'juli', 'Tom', 'nick','nick', 'juli'], 'subject': ['eng', 'maths', 'geo', 'maths', 'science', 'geo', 'maths', 'geo']}
df2 = pd.DataFrame(data2)
df2
Name subject
0 Jack eng
1 nick maths
2 Tom geo
3 juli maths
4 Tom science
5 nick geo
6 nick maths
7 juli geo
I want to filter df2 based on combination of 'Names' and 'subject' in df1. If a particular combination of 'Name' and 'subject' in df1 appears more than once and then it is matched in df2. If it matches then we get those rows from df2 as output.
Desired output:
pd.DataFrame({'Names':['Tom', 'juli', 'nick'], 'subject': ['science', 'maths', 'geo']})
Name subject
0 nick geo
1 juli maths
2 Tom science
can anyone help without using 'merge' option?
Upvotes: 1
Views: 48
Reputation: 862581
I believe you need filter rows with duplicated values only by DataFrame.duplicated
with keep=False
chained without this parameter and for them first rows and then use merge
for inner join:
df11 = df1[df1.duplicated(subset=['Name','subject'], keep=False) &
df1.duplicated(subset=['Name','subject'])]
df3 = df11.merge(df2, suffixes=('_',''))[df2.columns]
print (df3)
Name subject
0 nick geo
1 juli maths
2 Tom science
Another similar idea is filter columns by df2
in merge
:
cols = df2.columns
df11 = df1.loc[df1[cols].duplicated(keep=False) & df1[cols].duplicated(), cols]
df3 = df11.merge(df2)
print (df3)
Name subject
0 nick geo
1 juli maths
2 Tom science
Upvotes: 1