Reputation: 73
I have 2 dataframes like this:
df1
ID Value1 Amount2
1 100 10
2 400 20
3 300 50
df2
ID Value1 Amount2
2 200 20
3 300 30
I want to get a table like this form these two dfs.
Desired Output:
ID Value Amount Difference_Value Difference_Amount
df1 df2 df1 df2
1 100 0 10 0 100 10
2 400 200 20 20 200 0
3 300 300 50 30 0 20
I am a bit new to Multilevel index. I know this is possible but didn't find other questions helpful for my need.
I need this Value, Amount, Difference_Value and Difference_amount
columns to be merged cells in excel so that I need to know this.
Thank you.
Upvotes: 2
Views: 803
Reputation: 20669
You can try using df.merge
then split in column using pd.index.str.split
Use df.assign
with pd.Series.sub
for assigning difference values.
d = df.merge(df1,how='outer',on='ID',suffixes=('-df1','-df2')
).fillna(0)
d
ID Value1-df1 Amount2-df1 Value1-df2 Amount2-df2
0 1 100 10 0.0 0.0
1 2 400 20 200.0 20.0
2 3 300 50 300.0 30.0
d = d.assign(diff_value = d['Value1-df1'].sub(d['Value1-df2']),
diff_amount = d['Amount2-df1'].sub(d['Amount2-df2'])).set_index('ID')
d
Value1-df1 Amount2-df1 Value1-df2 Amount2-df2 diff_value diff_amount
ID
1 100 10 0.0 0.0 100.0 10.0
2 400 20 200.0 20.0 200.0 0.0
3 300 50 300.0 30.0 0.0 20.0
Now, split the column at '-'
with expand=True
for getting MultiIndex
then use df.sort_index
.
d.columns = d.columns.str.split('-',expand=True) #expand= True makes it MultiIndex
d.sort_index(axis=1)
Amount2 Value1 diff_amount diff_value
df1 df2 df1 df2 NaN NaN
ID
1 10 0.0 100 0.0 10.0 100.0
2 20 20.0 400 200.0 0.0 200.0
3 50 30.0 300 300.0 20.0 0.0
Upvotes: 1
Reputation: 862521
It is possible if MultiIndex
for all columns:
First convert ID
to index by DataFrame.set_index
, subtract by DataFrame.sub
and join together by concat
, last for change MultiIndex
is used DataFrame.swaplevel
and DataFrame.sort_index
:
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
df3 = df1.sub(df2, fill_value=0)
df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2', 'diff']))
.swaplevel(1,0, axis=1)
.fillna(0)
.sort_index(axis=1))
print (df)
Amount2 Value1
df1 df2 diff df1 df2 diff
ID
1 10 0.0 10.0 100 0.0 100.0
2 20 20.0 0.0 400 200.0 200.0
3 50 30.0 20.0 300 300.0 0.0
If try join together MultiIndex
and no MultiIndex
Dataframes, get tuples instead MultiIndex
:
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
df3 = df1.sub(df2, fill_value=0)
df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2']))
.swaplevel(1,0, axis=1)
.fillna(0)
.sort_index(axis=1)
.join(df3.add_prefix('Diff_')))
print (df)
(Amount2, df1) (Amount2, df2) (Value1, df1) (Value1, df2) Diff_Value1 \
ID
1 10 0.0 100 0.0 100.0
2 20 20.0 400 200.0 200.0
3 50 30.0 300 300.0 0.0
Diff_Amount2
ID
1 10.0
2 0.0
3 20.0
Upvotes: 2