Finding nearest value between Pandas dataframes

Question

I'm currently trying to generate a quartile map of a few key figures.

My quartiles are in a Pandas DataFrame looking like this:

                 0,05    0,1   0,25   0,33  
IndicatorName
indicator 1      10653  10512  10096   9857
indicator 2      2,85   2,87   3,01   3,11
indicator 3      1,66   1,75   1,84    1,9
indicator 4      13,01  11,78   8,55   7,64

This is a quartile mapping from a few hundred users I then query my sql database and get values for One user and load this into a DF

                value
IndicatorName
indicator1      9917.00
indicator2      3.10
indicator3      1.86
indicator4      13.74

What I want to do now is to create a new column in my second DF containing an indication of which quartile the value lies within (nearest matching value):

                value     quartile
IndicatorName
indicator1      9917.00   0,33
indicator2      3.10      0,33
indicator3      1.86      0,25
indicator4      13.74     0,05

How would you go about comparing dataframes like this?

jezrael · Accepted Answer

Zero step is replace , to . in df1 and cast to floats.

df1 = df1.replace(',','.', regex=True).astype(float)

Or:

df1 = pd.read_csv(file, decimal=',')

Also is necessary match indices, so if only difference is whitespaces, remove it:

df1.index = df1.index.str.replace('\s+','')

Then subtract column value by sub, get abs values and find column of minimal value by DataFrame.idxmin:

df2['quartile'] = df1.sub(df2['value'],axis=0).abs().idxmin(axis=1)
print (df2)
                 value quartile
IndicatorName                  
indicator1     9917.00     0,33
indicator2        3.10     0,33
indicator3        1.86     0,25
indicator4       13.74     0,05

Details:

print (df1.sub(df2['value'],axis=0))
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00 -60.00
indicator2      -0.25   -0.23   -0.09   0.01
indicator3      -0.20   -0.11   -0.02   0.04
indicator4      -0.73   -1.96   -5.19  -6.10

print (df1.sub(df2['value'],axis=0).abs())
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00  60.00
indicator2       0.25    0.23    0.09   0.01
indicator3       0.20    0.11    0.02   0.04
indicator4       0.73    1.96    5.19   6.10

Finding nearest value between Pandas dataframes

Answers (1)

Related Questions