aghtaal
aghtaal

Reputation: 307

comparison between two dataframes and find highest difference

I have two dataframes df1 and df2. Both are indexed the same with [i_batch, i_example] The columns are different rmse errors. I would like to find [i_batch, i_example] that df1 is a lot lower than df2, or find the rows that df1 has less error than df2 based on the common [i_batch, i_example]. Note that it is possible that a specific [i_batch, i_example] only happens in one of the df1 or df2. But I need to only consider [i_batch, i_example] that exists in both df1 and df2.

df1 = 

                   rmse_ACCELERATION  rmse_CENTER_X  rmse_CENTER_Y  rmse_HEADING  rmse_LENGTH  rmse_TURN_RATE  rmse_VELOCITY  rmse_WIDTH
i_batch i_example                                                                                                                       
0       0.0                    1.064          1.018          0.995         0.991        1.190           0.967          1.029       1.532
1       0.0                    1.199          1.030          1.007         1.048        1.278           0.967          1.156       1.468
        1.0                    1.101          1.026          1.114         2.762        0.967           0.967          1.083       1.186
2       0.0                    1.681          1.113          1.090         1.001        1.670           0.967          1.205       1.160
        1.0                    1.637          1.122          1.183         0.987        1.521           0.967          1.191       1.278
        2.0                    1.252          1.035          1.035         2.507        1.108           0.967          1.210       1.595
3       0.0                    1.232          1.014          1.019         1.627        1.143           0.967          1.080       1.583
        1.0                    1.195          1.028          1.019         1.151        1.097           0.967          1.071       1.549
        2.0                    1.233          1.010          1.004         1.616        1.135           0.967          1.082       1.573
        3.0                    1.179          1.017          1.014         1.368        1.132           0.967          1.099       1.518

and

df2 =
                   rmse_ACCELERATION  rmse_CENTER_X  rmse_CENTER_Y  rmse_HEADING  rmse_LENGTH  rmse_TURN_RATE  rmse_VELOCITY  rmse_WIDTH
i_batch i_example                                                                                                                       
1       0.0                    0.071          0.034          0.048         0.114        0.006       1.309e-03          0.461       0.004
        1.0                    0.052          0.055          0.062         2.137        0.023       8.232e-04          0.357       0.011
2       0.0                    1.665          0.156          0.178         0.112        0.070       3.751e-03          2.326       0.016
        1.0                    0.880          0.210          0.088         0.055        0.202       1.449e-03          0.899       0.047
        2.0                    0.199          0.072          0.078         1.686        0.010       6.240e-04          0.239       0.008
3       0.0                    0.332          0.068          0.097         1.211        0.022       5.127e-04          0.167       0.016
        1.0                    0.252          0.075          0.070         0.368        0.013       5.295e-04          0.136       0.008
        2.0                    0.268          0.067          0.064         1.026        0.010       5.564e-04          0.175       0.010
        3.0                    0.171          0.051          0.054         0.473        0.011       4.150e-04          0.220       0.009
5       0.0                    0.014          0.099          0.119         0.389        0.123       3.846e-04          0.313       0.037

For instance how can I get the [i_batch, i_example] that `df1[rmse_ACCELERATION] < df1[rmse_ACCELERATION]'?

Upvotes: 0

Views: 34

Answers (2)

wwnde
wwnde

Reputation: 26686

Use .sub(), that directly matches the indices and subtracts matches.

df3=df1.sub(df2)

df3[(df3<0).any(1)]

Or go specific and try searching in df1 by

df1[(df1.sub(df2)<0).any(1)]



                  rmse_ACCELERATION  rmse_CENTER_X  rmse_CENTER_Y  \
i_batch i_example                                                    
2       0.0                    0.016          0.957          0.912   

                   rmse_HEADING  rmse_LENGTH  rmse_TURN_RATE  rmse_VELOCITY  \
i_batch i_example                                                             
2       0.0               0.889          1.6        0.963249         -1.121   

                   rmse_WIDTH  
i_batch i_example              
2       0.0             1.144  

Upvotes: 1

jcaliz
jcaliz

Reputation: 4021

Do a merge and then just filter according to your needs

df_merge = df_1.merge(df_2,
                       left_index=True,
                       right_index=True,
                       suffixes=('_1','_2'))
df_merge[
    df_merge['rmse_ACCELERATION_1'] < df_merge['rmse_ACCELERATION_2']
].index

However I don't see any records with same [i_batch, i_example] in both dataframes that passes the condition

Upvotes: 1

Related Questions