Reputation: 33
I have a dataframe with real estate object parameters ('Rooms', 'Square' etc). I want to replace rows where 'Rooms' parameter equals to 0 to the corresponding number of rooms from a ('Rooms' - 'mean Square') dataframe I created from the same initial dataset.
I would use the .replace method, but the problem is - the actual Square values for 0 Room rows don't exactly match the mean values.
I'm new to pandas, so all solutions I would try are based on putting the column values into python lists and using cycles, which is a nightmare. All other similar topics I've seen on stackoverflow are good only for exact matches.
data.loc[data['Rooms'] == 0][['Rooms', 'Square']]
Rooms Square
1397 0.0 138.427694
1981 0.0 212.932361
2269 0.0 41.790881
3911 0.0 49.483501
4366 0.0 81.491446
4853 0.0 2.377248
6149 0.0 38.697117
8834 0.0 87.762616
mean_square = data.loc[(data['Rooms'] < 6) & (data['Rooms'] > 0)].groupby('Rooms', as_index=False)['Square'].mean()
Rooms Square
0 1.0 41.323277
1 2.0 56.788214
2 3.0 76.903234
3 4.0 98.377544
4 5.0 122.614941
For example, for item 1397 I would expect 0.0 changed to 5.0 (~138 sqm is closest to ~122 sq mean for 4 rooms).
Upvotes: 1
Views: 137
Reputation: 59549
This is pd.merge_asof
. Taking your first slice as df1
and the second as df2
, we can find the nearest match, 'Rooms_y'
, and align the result on the original index.
import pandas as pd
df1['Rooms'] = (pd.merge_asof(df1.reset_index().sort_values('Square'),
df2.sort_values('Square'),
on='Square', direction='nearest')
.set_index('index')['Rooms_y']
)
Rooms Square
1397 5.0 138.427694
1981 5.0 212.932361
2269 1.0 41.790881
3911 2.0 49.483501
4366 3.0 81.491446
4853 1.0 2.377248
6149 1.0 38.697117
8834 4.0 87.762616
Upvotes: 2