Henrik Poulsen
Henrik Poulsen

Reputation: 995

Finding nearest value between Pandas dataframes

I'm currently trying to generate a quartile map of a few key figures.

My quartiles are in a Pandas DataFrame looking like this:

                 0,05    0,1   0,25   0,33  
IndicatorName
indicator 1      10653  10512  10096   9857
indicator 2      2,85   2,87   3,01   3,11
indicator 3      1,66   1,75   1,84    1,9
indicator 4      13,01  11,78   8,55   7,64

This is a quartile mapping from a few hundred users I then query my sql database and get values for One user and load this into a DF

                value
IndicatorName
indicator1      9917.00
indicator2      3.10
indicator3      1.86
indicator4      13.74

What I want to do now is to create a new column in my second DF containing an indication of which quartile the value lies within (nearest matching value):

                value     quartile
IndicatorName
indicator1      9917.00   0,33
indicator2      3.10      0,33
indicator3      1.86      0,25
indicator4      13.74     0,05

How would you go about comparing dataframes like this?

Upvotes: 2

Views: 1822

Answers (1)

jezrael
jezrael

Reputation: 863301

Zero step is replace , to . in df1 and cast to floats.

df1 = df1.replace(',','.', regex=True).astype(float)

Or:

df1 = pd.read_csv(file, decimal=',')

Also is necessary match indices, so if only difference is whitespaces, remove it:

df1.index = df1.index.str.replace('\s+','')

Then subtract column value by sub, get abs values and find column of minimal value by DataFrame.idxmin:

df2['quartile'] = df1.sub(df2['value'],axis=0).abs().idxmin(axis=1)
print (df2)
                 value quartile
IndicatorName                  
indicator1     9917.00     0,33
indicator2        3.10     0,33
indicator3        1.86     0,25
indicator4       13.74     0,05

Details:

print (df1.sub(df2['value'],axis=0))
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00 -60.00
indicator2      -0.25   -0.23   -0.09   0.01
indicator3      -0.20   -0.11   -0.02   0.04
indicator4      -0.73   -1.96   -5.19  -6.10

print (df1.sub(df2['value'],axis=0).abs())
                 0,05     0,1    0,25   0,33
IndicatorName                               
indicator1     736.00  595.00  179.00  60.00
indicator2       0.25    0.23    0.09   0.01
indicator3       0.20    0.11    0.02   0.04
indicator4       0.73    1.96    5.19   6.10

Upvotes: 1

Related Questions