Reputation: 1293
I have a csv file where each row represents a property followed by a variable number of subsequent rows that reflect rooms in the property. I want to create a column that, for each property, summates the gross floor area of each room. The unstructured nature of the data is making this difficult to achieve in pandas
. Here is an example of the table I have at the moment:
id ba store_desc floor_area
0 1 Toy Shop NaN
1 2 Retail Zone A 29.42
2 2 Retail Zone B 31.29
3 1 Grocery Store NaN
4 2 Retail Zone A 68.00
5 2 Outside Garden 83.50
6 2 Office 7.30
Here is the table I am trying to create:
id ba store_desc floor_area gross_floor_area
0 1 Toy Shop NaN 60.71
3 1 Grocery Store NaN 158.8
Does anybody have any pointers on how to achieve this result? I'm totally lost.
Sam
Upvotes: 0
Views: 1939
Reputation: 346
First made a temporary column named category which I then forward filled, grouped by that column to get the sum, and then mapped that back to the relevant store_desc values.
df['category'] = df[df.floor_area.isnull()]['store_desc']
df['category'].fillna(method='ffill',inplace=True)
df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area)
df.drop('category',axis=1,inplace=True)
df[df.gross_floor_area.notnull()]
Upvotes: 1
Reputation: 323226
IIUC
df1=df[df['floor_area'].isnull()]
df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values
df1
Out[463]:
id ba store_desc floor_area gross_floor_area
0 0 1 ToyShop NaN 60.71
3 3 1 GroceryStore NaN 158.80
Upvotes: 3