Reputation: 187
I have a dataset of house sales with timestamped Periods(per quarter). I want to adjust the price according to the house pricing index change per region. I have a separate dataframe with 3 columns, the Quarter, the Region and the % change in price. I am currently achieving this by iterating over both dataframes. Is there a better way?
Minimal Example;
import pandas as pd
houses_df = pd.DataFrame({'HousePrice' : [100000, 250000, 125000, 320000],
'Period' : ['2020Q1', '2020Q2', '2020Q1', '2020Q3'],
'Region' : ['NY-West', 'NY-East', 'NY-West', 'NY-East']})
HPindex_df = pd.DataFrame({'Periods' : ['2020Q1', '2020Q2', '2020Q3'] * 2,
'Regions' : ['NY-West', 'NY-West', 'NY-West', 'NY-East', 'NY-East', 'NY-East'],
'PriceIndex' : [1.1, 1.13, 0.87, 1.35, 1.21, 1.11]})
for index, row in houses_df.iterrows():
for row1 in HPindex_df.itertuples():
a = row1
if row['Region'] == row1.Regions and row['Period'] == row1.Periods:
houses_df.loc[index, 'HousePrice'] = houses_df.loc[index, 'HousePrice'] * row1.PriceIndex
print(houses_df)
HousePrice Period Region
0 110000.0 2020Q1 NY-West
1 302500.0 2020Q2 NY-East
2 137500.0 2020Q1 NY-West
3 355200.0 2020Q3 NY-East
print(HPindex_df)
Periods Regions PriceIndex
0 2020Q1 NY-West 1.10
1 2020Q2 NY-West 1.13
2 2020Q3 NY-West 0.87
3 2020Q1 NY-East 1.35
4 2020Q2 NY-East 1.21
5 2020Q3 NY-East 1.11
I don't think this is the best way to do this, since this is a naive implementation. Is there a (vectorized/pandas built-in functions)-way to achieve this?
Upvotes: 1
Views: 66
Reputation: 862781
Use DataFrame.merge
with left_on
and right_on
, then get all 4 column in output:
df = houses_df.merge(HPindex_df,
left_on=['Period','Region'],
right_on=['Periods','Regions'],
how='left')
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)
HousePrice Period Region Periods Regions PriceIndex
0 110000.0 2020Q1 NY-West 2020Q1 NY-West 1.10
1 302500.0 2020Q2 NY-East 2020Q2 NY-East 1.21
2 137500.0 2020Q1 NY-West 2020Q1 NY-West 1.10
3 355200.0 2020Q3 NY-East 2020Q3 NY-East 1.11
For avoid it is possible use rename
:
d = {'Periods':'Period','Regions':'Region'}
df = houses_df.merge(HPindex_df.rename(columns=d), on=['Period','Region'], how='left')
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)
HousePrice Period Region PriceIndex
0 110000.0 2020Q1 NY-West 1.10
1 302500.0 2020Q2 NY-East 1.21
2 137500.0 2020Q1 NY-West 1.10
3 355200.0 2020Q3 NY-East 1.11
Or DataFrame.join
with DataFrame.set_index
:
df = houses_df.join(HPindex_df.set_index(['Periods','Regions']), on=['Period','Region'])
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)
HousePrice Period Region PriceIndex
0 110000.0 2020Q1 NY-West 1.10
1 302500.0 2020Q2 NY-East 1.21
2 137500.0 2020Q1 NY-West 1.10
3 355200.0 2020Q3 NY-East 1.11
Upvotes: 2