Sam Comber
Sam Comber

Reputation: 1293

Create new column from specific rows in pandas dataframe

I have a csv file where each row represents a property followed by a variable number of subsequent rows that reflect rooms in the property. I want to create a column that, for each property, summates the gross floor area of each room. The unstructured nature of the data is making this difficult to achieve in pandas. Here is an example of the table I have at the moment:

id  ba  store_desc      floor_area
0   1   Toy Shop        NaN
1   2   Retail Zone A   29.42
2   2   Retail Zone B   31.29
3   1   Grocery Store   NaN
4   2   Retail Zone A   68.00
5   2   Outside Garden  83.50
6   2   Office          7.30

Here is the table I am trying to create:

id  ba  store_desc      floor_area   gross_floor_area
0   1   Toy Shop        NaN          60.71
3   1   Grocery Store   NaN          158.8

Does anybody have any pointers on how to achieve this result? I'm totally lost.

Sam

Upvotes: 0

Views: 1939

Answers (2)

Nathan H
Nathan H

Reputation: 346

First made a temporary column named category which I then forward filled, grouped by that column to get the sum, and then mapped that back to the relevant store_desc values.

df['category'] = df[df.floor_area.isnull()]['store_desc']

df['category'].fillna(method='ffill',inplace=True)

df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area)

df.drop('category',axis=1,inplace=True)

df[df.gross_floor_area.notnull()]

Upvotes: 1

BENY
BENY

Reputation: 323226

IIUC

df1=df[df['floor_area'].isnull()]

df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values

df1
Out[463]: 
   id  ba    store_desc  floor_area  gross_floor_area
0   0   1       ToyShop         NaN             60.71
3   3   1  GroceryStore         NaN            158.80

Upvotes: 3

Related Questions