Reputation: 2904
I'm having one base dataframe like below-
df1_data = {'id' :{0:'101',1:'102',2:'103',3:'104',4:'105'},
'sym1' :{0:'abc',1:'pqr',2:'xyz',3:'mno',4:'lmn'}}
df1 = pd.DataFrame(df1_data)
print df1
id sym1
0 101 abc
1 102 pqr
2 103 xyz
3 104 mno
4 105 lmn
From this dataframe, I want to check column sym1 available or not in other four dataframes column or not?
four different dataframes:
df2_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'a',1:'b',2:'c',3:'d'}}
df2 = pd.DataFrame(df2_data)
print df2
df3_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'h',1:'i',2:'k',3:'l'}}
df3 = pd.DataFrame(df2_data)
print df3
df4_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'p',1:'q',2:'r',3:'s'}}
df4 = pd.DataFrame(df4_data)
print df4
df5_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
'name' :{0:'w',1:'x',2:'y',3:'z'}}
df5 = pd.DataFrame(df5_data)
print df5
In column sym2 available in Dataframes df2,df3,df4,df5 may contain same symbols or may not contain same symbols. My intension is to check sym1 column values available or not in df2,df3,df4,df5 dataframes sym2 column values or not?
Expected output-
id sym1
0 102 pqr
1 105 lmn
conclusion-
symbols pqr and lmn not available in sym2 column of dataframes df2,df3,df4 and df5.
Upvotes: 3
Views: 168
Reputation: 294488
isin
to check if each element of df1.sym1
is in some other iterable pd.concat
to string all other dataframes togetherdf1[~df1.sym1.isin(pd.concat([df2, df3, df4, df5]).sym2)]
id sym1
1 102 pqr
4 105 lmn
numpy
variant that is 3 times quickerdf1[~df1.sym1.isin(np.concatenate([d.sym2.values for d in [df2, df3, df4, df5]]))]
Upvotes: 5
Reputation: 863166
Another solution for comparing with merge
and parameter indicator
:
dfs = [df2,df3,df4,df5]
df = pd.concat(dfs, keys=['df2','df3','df4','df5'])
print (df)
name sym2
df2 0 a abc
1 b xxx
2 c xyz
3 d mno
df3 0 a abc
1 b xxx
2 c xyz
3 d mno
df4 0 p abc
1 q xxx
2 r xyz
3 s mno
df5 0 w abc
1 x xxx
2 y xyz
3 z mno
merged = pd.merge(df.rename_axis(['dfs','idx']).reset_index(),
df1,
left_on='sym2',
right_on='sym1',
how='outer',
indicator=True)
print (merged)
dfs idx name sym2 id sym1 _merge
0 df2 0.0 a abc 101 abc both
1 df3 0.0 a abc 101 abc both
2 df4 0.0 p abc 101 abc both
3 df5 0.0 w abc 101 abc both
4 df2 1.0 b xxx NaN NaN left_only
5 df3 1.0 b xxx NaN NaN left_only
6 df4 1.0 q xxx NaN NaN left_only
7 df5 1.0 x xxx NaN NaN left_only
8 df2 2.0 c xyz 103 xyz both
9 df3 2.0 c xyz 103 xyz both
10 df4 2.0 r xyz 103 xyz both
11 df5 2.0 y xyz 103 xyz both
12 df2 3.0 d mno 104 mno both
13 df3 3.0 d mno 104 mno both
14 df4 3.0 s mno 104 mno both
15 df5 3.0 z mno 104 mno both
16 NaN NaN NaN NaN 102 pqr right_only
17 NaN NaN NaN NaN 105 lmn right_only
print (merged.loc[merged['_merge']=='right_only', ['id','sym1']])
id sym1
16 102 pqr
17 105 lmn
print (merged.loc[merged['_merge']=='left_only', ['dfs', 'sym2']])
dfs sym2
4 df2 xxx
5 df3 xxx
6 df4 xxx
7 df5 xxx
Upvotes: 4