Pandas- update value in a specific column based on duplicate rows

Question

I have a pandas database of apartment building sales, one column is the price and another column is the date sold. Some of these sales were for multiple properties, however the price listed for each property reflects the total sale price of multiple properties. These bundle deals can be further identified by the date which the sale took place on.

For example:

Address	Price	Date Sold	Tax Assessed Value
301-303 EAST 4TH STREET	3672530	11/24/2020	3420000
9 AVENUE B	1250000	06/16/2020	650000
11 AVENUE B	1250000	06/16/2020	800000
231-233 EAST 4TH STREET	2500000	06/16/2020	5111000

I've so far identified all duplicates in the dataframe by using:

df[df.duplicated(['Price', 'Date Sold'], keep = False)]

Which returns:

Address	Price	Date Sold	Tax Assessed Value
9 AVENUE B	1250000	06/16/2020	650000
11 AVENUE B	1250000	06/16/2020	800000

There are many bundle deals within the database with varying numbers of buildings. I'd like to estimate and update the price for each building within a bundle by using its proportion of the total tax assessed value for the bundle multiplied by the price value.

ex. (650000/(650000+800000))*1250000 = 560344.8

So, I'd end up with:

Address	Price	Date Sold	Tax Assessed Value
9 AVENUE B	560344.8	06/16/2020	650000
11 AVENUE B	689655.2	06/16/2020	800000

I've found some previous questions on how to replace the whole row or one column value, but ultimately I'm pretty lost when it comes to identifying each bundle and calculating the proportion.

Quang Hoang · Accepted Answer

Try:

df['Price'] *= (df['Tax Assessed Value'] / 
                df.groupby(['Price', 'Date Sold'])['Tax Assessed Value'].transform('sum') 
               )

but I think you need to identify exactly what you mean by duplicates

Pandas- update value in a specific column based on duplicate rows

Answers (1)

Related Questions