Reputation: 2120
I have a dataframe like this
col1 col2 col3 col4
id Category
a blue 4 1 3 0
red 1 0 0 4
b red 0 1 8 5
which can be made using the following code
df = pd.DataFrame({ 'id': ['a','a','b'],'Category': ['red','blue','red'], 'col1': [1,4,0], 'col2': [0,1,1],'col3' : [0,3,8], 'col4': [4,0,5]})
sum_df = df.groupby(['id','Category']).agg({'col1': 'sum', 'col2': 'sum','col3': 'sum', 'col4': 'sum'})
I want the output to look like this
id red_col1 red_col2 red_col3 red_col4 blue_col1 blue_col2 blue_col3 blue_col4
0 a 1 0 0 4 4.0 1.0 3.0 0.0
1 b 0 1 8 5 NaN NaN NaN NaN
Where I want every id
to be unique for a row and the values summed up into respective columns. There are 1000s of ids in the dataset and 1000s of categories too. If a column does not have a value associated with it, then it should be blank i.e. None
.
Upvotes: 3
Views: 398
Reputation: 69
Unstack using
df = df.unstack()
Then you can add the prefix
df = df.add_prefix(category'_')
This will fix
Upvotes: 0
Reputation: 323266
IIUC
s=df.unstack().sort_index(level=1,axis=1)
s.columns=s.columns.map('{0[1]}_{0[0]}'.format)
s
Out[136]:
blue_col1 blue_col2 blue_col3 ... red_col2 red_col3 red_col4
id ...
a 4.0 1.0 3.0 ... 0.0 0.0 4.0
b NaN NaN NaN ... 1.0 8.0 5.0
[2 rows x 8 columns]
Upvotes: 1
Reputation: 30920
Use DataFrame.unstack
and then rename columns:
new_df = df.unstack('Category')
new_df.columns = [f'{color}_{col}' for col, color in new_df.columns]
new_df=new_df.sort_index(axis=1).reset_index()
print(new_df)
id blue_col1 blue_col2 blue_col3 blue_col4 red_col1 red_col2 \
0 a 4.0 1.0 3.0 0.0 1.0 0.0
1 b NaN NaN NaN NaN 0.0 1.0
red_col3 red_col4
0 0.0 4.0
1 8.0 5.0
Upvotes: 2