Reputation: 57
I have a pandas database of apartment building sales, one column is the price and another column is the date sold. Some of these sales were for multiple properties, however the price listed for each property reflects the total sale price of multiple properties. These bundle deals can be further identified by the date which the sale took place on.
For example:
Address | Price | Date Sold | Tax Assessed Value |
---|---|---|---|
301-303 EAST 4TH STREET | 3672530 | 11/24/2020 | 3420000 |
9 AVENUE B | 1250000 | 06/16/2020 | 650000 |
11 AVENUE B | 1250000 | 06/16/2020 | 800000 |
231-233 EAST 4TH STREET | 2500000 | 06/16/2020 | 5111000 |
I've so far identified all duplicates in the dataframe by using:
df[df.duplicated(['Price', 'Date Sold'], keep = False)]
Which returns:
Address | Price | Date Sold | Tax Assessed Value |
---|---|---|---|
9 AVENUE B | 1250000 | 06/16/2020 | 650000 |
11 AVENUE B | 1250000 | 06/16/2020 | 800000 |
There are many bundle deals within the database with varying numbers of buildings. I'd like to estimate and update the price for each building within a bundle by using its proportion of the total tax assessed value for the bundle multiplied by the price value.
ex. (650000/(650000+800000))*1250000 = 560344.8
So, I'd end up with:
Address | Price | Date Sold | Tax Assessed Value |
---|---|---|---|
9 AVENUE B | 560344.8 | 06/16/2020 | 650000 |
11 AVENUE B | 689655.2 | 06/16/2020 | 800000 |
I've found some previous questions on how to replace the whole row or one column value, but ultimately I'm pretty lost when it comes to identifying each bundle and calculating the proportion.
Upvotes: 3
Views: 106
Reputation: 150825
Try:
df['Price'] *= (df['Tax Assessed Value'] /
df.groupby(['Price', 'Date Sold'])['Tax Assessed Value'].transform('sum')
)
but I think you need to identify exactly what you mean by duplicates
Upvotes: 1