Reputation: 27
I am newbie to python and pandas. looking for help in subtraction between two different df column Sample data
df1 = pd.DataFrame({'label1': ['lab1', 'lab2', 'lab3', 'lab4'],
'compare1': [10, 20, 30, 40],
'col3': [100, 50, -30, -50]})
df2 = pd.DataFrame({'lable1': ['lab1', 'lab2', 'lab4', 'lab5'],
'compare1': [80,10, 40, 50],
'col3': [200, 60, 30, 50]})
df_diff = pd.DataFrame({'label': ['lab1', 'lab2', 'lab4', 'lab3', 'lab5'],
'df1_compare1': [10, 20, 40, 30, 0],
'df2_compare1': [80,10, 40, 0, 50],
'compare': [-70, 10, 0, 30, 50]})
Column name and count are same for both dfs.
Need help in subtraction (or other maths operation) between df1.compare1 and df2.compare2 if label1 and label2 has the same value.
for example, lab1, lab2 and lab 4(though lab 4 on different row) are having same label value. any rows which are not same in both df should show up at the bottom of the df with original value while other column data shows as 0.
Actual data are in CSV/excel MB in size, if possible looking for performance effective method
Does the non-null object has impact on performance?? This is the column type on label.
Upvotes: 0
Views: 372
Reputation: 42916
First merge
your dataframes on the column label
.
Finally create your compare
column by substracting compare1-compare2
:
df_diff = df1[['label1', 'compare1']].merge(df2[['label1', 'compare1']],
on='label1',
how='outer',
suffixes=['_df1', '_df2'])
df_diff['compare'] = df_diff['compare1_df1'].fillna(0) - df_diff['compare1_df2'].fillna(0)
label1 compare1_df1 compare1_df2 compare
0 lab1 10.0 80.0 -70.0
1 lab2 20.0 10.0 10.0
2 lab3 30.0 NaN 30.0
3 lab4 40.0 40.0 0.0
4 lab5 NaN 50.0 -50.0
If you have 20 min time, here's a good read on how merging
works: Pandas Merging 101
Upvotes: 1