Maxim  Biryukov
Maxim Biryukov

Reputation: 33

How to change values in a dataframe to values from another dataframe, with closest match?

I have a dataframe with real estate object parameters ('Rooms', 'Square' etc). I want to replace rows where 'Rooms' parameter equals to 0 to the corresponding number of rooms from a ('Rooms' - 'mean Square') dataframe I created from the same initial dataset.

I would use the .replace method, but the problem is - the actual Square values for 0 Room rows don't exactly match the mean values.

I'm new to pandas, so all solutions I would try are based on putting the column values into python lists and using cycles, which is a nightmare. All other similar topics I've seen on stackoverflow are good only for exact matches.

This is the slice and part of the initial dataframe where I want the values ('Rooms') to be changed:

data.loc[data['Rooms'] == 0][['Rooms', 'Square']]

      Rooms Square
1397  0.0   138.427694
1981  0.0   212.932361
2269  0.0   41.790881
3911  0.0   49.483501
4366  0.0   81.491446
4853  0.0   2.377248
6149  0.0   38.697117
8834  0.0   87.762616

This is the code that creates the 'Rooms' - 'mean Square' dataframe:

mean_square = data.loc[(data['Rooms'] < 6) & (data['Rooms'] > 0)].groupby('Rooms', as_index=False)['Square'].mean()

This is the result:

    Rooms   Square
0   1.0     41.323277
1   2.0     56.788214
2   3.0     76.903234
3   4.0     98.377544
4   5.0     122.614941

For example, for item 1397 I would expect 0.0 changed to 5.0 (~138 sqm is closest to ~122 sq mean for 4 rooms).

Upvotes: 1

Views: 137

Answers (1)

ALollz
ALollz

Reputation: 59549

This is pd.merge_asof. Taking your first slice as df1 and the second as df2, we can find the nearest match, 'Rooms_y', and align the result on the original index.

import pandas as pd

df1['Rooms'] = (pd.merge_asof(df1.reset_index().sort_values('Square'), 
                              df2.sort_values('Square'), 
                              on='Square', direction='nearest')
                   .set_index('index')['Rooms_y']
                )

Output:

      Rooms      Square
1397    5.0  138.427694
1981    5.0  212.932361
2269    1.0   41.790881
3911    2.0   49.483501
4366    3.0   81.491446
4853    1.0    2.377248
6149    1.0   38.697117
8834    4.0   87.762616

Upvotes: 2

Related Questions