Jrob1765
Jrob1765

Reputation: 57

Pandas- update value in a specific column based on duplicate rows

I have a pandas database of apartment building sales, one column is the price and another column is the date sold. Some of these sales were for multiple properties, however the price listed for each property reflects the total sale price of multiple properties. These bundle deals can be further identified by the date which the sale took place on.

For example:

Address Price Date Sold Tax Assessed Value
301-303 EAST 4TH STREET 3672530 11/24/2020 3420000
9 AVENUE B 1250000 06/16/2020 650000
11 AVENUE B 1250000 06/16/2020 800000
231-233 EAST 4TH STREET 2500000 06/16/2020 5111000

I've so far identified all duplicates in the dataframe by using:

df[df.duplicated(['Price', 'Date Sold'], keep = False)]

Which returns:

Address Price Date Sold Tax Assessed Value
9 AVENUE B 1250000 06/16/2020 650000
11 AVENUE B 1250000 06/16/2020 800000

There are many bundle deals within the database with varying numbers of buildings. I'd like to estimate and update the price for each building within a bundle by using its proportion of the total tax assessed value for the bundle multiplied by the price value.

ex. (650000/(650000+800000))*1250000 = 560344.8

So, I'd end up with:

Address Price Date Sold Tax Assessed Value
9 AVENUE B 560344.8 06/16/2020 650000
11 AVENUE B 689655.2 06/16/2020 800000

I've found some previous questions on how to replace the whole row or one column value, but ultimately I'm pretty lost when it comes to identifying each bundle and calculating the proportion.

Upvotes: 3

Views: 106

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

Try:

df['Price'] *= (df['Tax Assessed Value'] / 
                df.groupby(['Price', 'Date Sold'])['Tax Assessed Value'].transform('sum') 
               )

but I think you need to identify exactly what you mean by duplicates

Upvotes: 1

Related Questions