Reputation: 51
As a part of data comparison in Python, i have a dataframe's output. As you can see, PROD_
and PROJ_
data are compared.
Sample:
print (df)
PROD_Label PROJ_Label Diff_Label PROD_OAD PROJ_OAD \
0 Energy Energy True 1.94 1.94
1 Food and Beverage Food and Beverage True 1.97 1.97
2 Healthcare Healthcare True 8.23 8.23
3 Consumer Products Consumer Products True 3.67 NaN
4 Retailers Retailers True 5.88 NaN
Diff_OAD PROD_OAD_Tin PROJ_OAD_Tin Diff_OAD_Tin
0 True 0.02 0.02 True
1 True 0.54 0.01 False
2 True 0.05 0.05 True
3 False 0.02 0.02 True
4 False 0.06 0.06 True
String columns like PROD_Label
, PROJ_Label
are "non-null objects". Here the comparison results in true/false and as expected.
While for numeric columns like PROD_OAD
, PROJ_OAD
, PROD_OAD_Tin
, PROJ_OAD_Tin
are "non-null float64". Currently my output is showing the comparison as true and false (as above). but i expect this to be as the actual differences, like below but only for the numeric columns.
Is there a way to specify the particular column names and get the difference of results to be dumped in the Diff_
column.
Please note i don't want to compare all the PROD_
and PROJ_
columns. String's differences are already correct in true/false. Just looking for some specific columns which are in numeric format.
Upvotes: 1
Views: 67
Reputation: 862761
I think if exist only numeric columns with same structure is possible extract only numeric columns and get unique values which are used in for
with sub
:
a = df.select_dtypes([np.number]).columns.str.split('_', n=1).str[1].unique()
print (a)
Index(['OAD', 'OAD_Tin'], dtype='object')
for x in a:
df['Diff_' + x] = df['PROJ_' + x].sub(df['PROD_' + x], fill_value=0)
print (df)
PROD_Label PROJ_Label Diff_Label PROD_OAD PROJ_OAD \
0 Energy Energy True 1.94 1.94
1 Food and Beverage Food and Beverage True 1.97 1.97
2 Healthcare Healthcare True 8.23 8.23
3 Consumer Products Consumer Products True 3.67 NaN
4 Retailers Retailers True 5.88 NaN
Diff_OAD PROD_OAD_Tin PROJ_OAD_Tin Diff_OAD_Tin
0 0.00 0.02 0.02 0.00
1 0.00 0.54 0.01 -0.53
2 0.00 0.05 0.05 0.00
3 -3.67 0.02 0.02 0.00
4 -5.88 0.06 0.06 0.00
Upvotes: 2