secretive
secretive

Reputation: 2120

Transform pandas dataframe into different format

I have a dataframe like this

             col1  col2  col3  col4
id Category                        
a  blue         4     1     3     0
   red          1     0     0     4
b  red          0     1     8     5

which can be made using the following code

df = pd.DataFrame({ 'id': ['a','a','b'],'Category': ['red','blue','red'], 'col1': [1,4,0], 'col2': [0,1,1],'col3' : [0,3,8], 'col4': [4,0,5]})
sum_df = df.groupby(['id','Category']).agg({'col1': 'sum', 'col2': 'sum','col3': 'sum', 'col4': 'sum'})

I want the output to look like this

  id  red_col1  red_col2  red_col3  red_col4  blue_col1  blue_col2  blue_col3  blue_col4
0  a         1         0         0         4        4.0        1.0        3.0        0.0
1  b         0         1         8         5        NaN        NaN        NaN        NaN

Where I want every id to be unique for a row and the values summed up into respective columns. There are 1000s of ids in the dataset and 1000s of categories too. If a column does not have a value associated with it, then it should be blank i.e. None.

Upvotes: 3

Views: 398

Answers (3)

Omotayo Samson
Omotayo Samson

Reputation: 69

Unstack using

df = df.unstack()

Then you can add the prefix

df = df.add_prefix(category'_')

This will fix

Upvotes: 0

BENY
BENY

Reputation: 323266

IIUC

s=df.unstack().sort_index(level=1,axis=1)
s.columns=s.columns.map('{0[1]}_{0[0]}'.format) 
s
Out[136]: 
    blue_col1  blue_col2  blue_col3  ...  red_col2  red_col3  red_col4
id                                   ...                              
a         4.0        1.0        3.0  ...       0.0       0.0       4.0
b         NaN        NaN        NaN  ...       1.0       8.0       5.0
[2 rows x 8 columns]

Upvotes: 1

ansev
ansev

Reputation: 30920

Use DataFrame.unstack and then rename columns:

new_df = df.unstack('Category')
new_df.columns = [f'{color}_{col}' for col, color in new_df.columns]
new_df=new_df.sort_index(axis=1).reset_index()
print(new_df)

  id  blue_col1  blue_col2  blue_col3  blue_col4  red_col1  red_col2  \
0  a        4.0        1.0        3.0        0.0       1.0       0.0   
1  b        NaN        NaN        NaN        NaN       0.0       1.0   

   red_col3  red_col4  
0       0.0       4.0  
1       8.0       5.0  

Upvotes: 2

Related Questions