Polaster
Polaster

Reputation: 57

How could I replace null value In a group?

I created this dataframe I calculated the gap that I was looking but the problem is that some flats have the same price and I get a difference of price of 0. How could I replace the value 0 by the difference with the last lower price of the same group.

for example:

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:3

neighboorhood:a, bed:1, bath:1, price:2

I get difference price of 0,2,1,nan and I'm looking for 2,2,1,nan (briefly I don't want to compare 2 flats with the same price)

Thanks in advance and good day.

data=[
    [1,'a',1,1,5],[2,'a',1,1,5],[3,'a',1,1,4],[4,'a',1,1,2],[5,'b',1,2,6],[6,'b',1,2,6],[7,'b',1,2,3]
]
df = pd.DataFrame(data, columns = ['id','neighborhoodname', 'beds', 'baths', 'price']) 

df['difference_price'] = ( df.dropna()
                             .sort_values('price',ascending=False)
                             .groupby(['city','beds','baths'])['price'].diff(-1) )

Upvotes: 1

Views: 146

Answers (1)

jezrael
jezrael

Reputation: 863166

I think you can remove duplicates first per all columns used for groupby with diff, create new column in filtered data and last use merge with left join to original:

df1 = (df.dropna()
         .sort_values('price',ascending=False)
         .drop_duplicates(['neighborhoodname','beds','baths', 'price']))

df1['difference_price']  = df1.groupby(['neighborhoodname','beds','baths'])['price'].diff(-1)

df = df.merge(df1[['neighborhoodname','beds','baths','price', 'difference_price']], how='left')
print (df)
   id neighborhoodname  beds  baths  price  difference_price
0   1                a     1      1      5               1.0
1   2                a     1      1      5               1.0
2   3                a     1      1      4               2.0
3   4                a     1      1      2               NaN
4   5                b     1      2      6               3.0
5   6                b     1      2      6               3.0
6   7                b     1      2      3               NaN

Or you can use lambda function for back filling 0 values per groups for avoid wrong outputs if one row groups (data moved from another groups):

df['difference_price'] = (df.sort_values('price',ascending=False)
                            .groupby(['neighborhoodname','beds','baths'])['price']
                            .apply(lambda x: x.diff(-1).replace(0, np.nan).bfill()))

print (df)     
   id neighborhoodname  beds  baths  price  difference_price
0   1                a     1      1      5               1.0
1   2                a     1      1      5               1.0
2   3                a     1      1      4               2.0
3   4                a     1      1      2               NaN
4   5                b     1      2      6               3.0
5   6                b     1      2      6               3.0
6   7                b     1      2      3               NaN

Upvotes: 1

Related Questions