Reputation: 239
df1 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','c','d','e','f'],
"empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1
df2 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','m','d','n','f'],
"empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2
df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all
df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final
Based on df_final data frame, need to create following output. here comparison column need to created dynamically for every identical column as i'm trying to compare two data frame(both data frame structure and column name are same) where number of columns are more than 300
Upvotes: 0
Views: 348
Reputation: 19322
==
You can do this with a simple ==
between two dataframes that you need to compare. Let's start with the original 2 dataframes df1
and df2
-
first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second #<---
output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)
empname empcity
first second comparisons first second comparisons
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
In addition to the excellent answer by jezrael, I am adding an alternate way of doing this using pandas groupby.
#create comparisons
comparisons = (df_all.T
.groupby(level=-1)
.apply(lambda x: x.iloc[0]==x.iloc[1])
.T)
#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])
#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
.reindex(df1.set_index('empid')
.columns, axis=1, level=0))
print(df_final)
empname empcity
first second comparison first second comparison
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
Upvotes: 1
Reputation: 862511
Use DataFrame.stack
for possible compare all levels columns first
with second
, create new column in DataFrame.assign
and reshape back by DataFrame.unstack
with DataFrame.swaplevel
and DataFrame.reindex
for original order:
#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']
df_final = (df_all.stack()
.assign(comparions=lambda x: x['first'].eq(x['second']))
.unstack()
.swaplevel(axis = 'columns')
.reindex(orig, axis=1, level=0))
print (df_final)
empname empcity
first second comparions first second comparions
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
Upvotes: 2
Reputation:
(i) Use get_level_values
to get the label values for level 0
(ii) Iterate over the outcome of (i) and for each level=0
, do element-wise comparison using eq
between first
and second
(iii) use sort_index
to sort columns in desired order
for level_0 in df_final.columns.get_level_values(0).unique():
df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)
Output:
empcity empname
first second comparison first second comparison
empid
1 aa aa True a a True
2 bb bb True b b True
3 cc cc True c m False
4 dd ddd False d d True
5 ee ee True e n False
6 ff fff False f f True
Upvotes: 1