sosukeinu
sosukeinu

Reputation: 389

Pandas rename/transpose column using value from another column

I'm not a pandas expert, so I'm having difficulty knowing exactly how to accomplish what, I would think, is a very simple DataFrame manipulation. I have a DataFrame like below:

                       Id              appName       _RT      _count
0    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App1  0.003000       1
1    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2  0.026000       1
2    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App3  0.029000       1
3    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2  0.052000       2
4    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2  0.070000       1
5    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App1  0.124000       2
6    f03eaf42-9080-43c8-a892-f910fa442ee6      App3  0.060000       1

What I'm trying to accomplish is creating new columns for appName for _RT and _count like so:

                       Id              appName       App1_RT      App1_count   App2_RT  App2_count  App3_RT    App3_count
0    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App1  0.003000       1
1    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2                           0.026000     1
2    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App3                                                 0.029000       1
3    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2                           0.052000       2
4    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App2                           0.070000       1
5    ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593      App1  0.124000       2
6    f03eaf42-9080-43c8-a892-f910fa442ee6      App3                                                0.060000       1

My first thought is to use transpose(), but I feel like there has to be some other way to do this. Honestly, I'm just kind of lost on the best way to approach this. Any help would be greatly appreciated. Thank you.

Upvotes: 1

Views: 163

Answers (1)

unutbu
unutbu

Reputation: 879251

df = df.set_index(['Id', 'appName'], append=True).unstack()
df.columns = ['{}{}'.format(name, id) for id, name in df.columns]

yields

            App1_RT  App2_RT  App3_RT  App1_count  App2_count  App3_count
  Id                                                                     
0 ef9fdeed    0.003      NaN      NaN           1         NaN         NaN
1 ef9fdeed      NaN    0.026      NaN         NaN           1         NaN
2 ef9fdeed      NaN      NaN    0.029         NaN         NaN           1
3 ef9fdeed      NaN    0.052      NaN         NaN           2         NaN
4 ef9fdeed      NaN    0.070      NaN         NaN           1         NaN
5 ef9fdeed    0.124      NaN      NaN           2         NaN         NaN
6 f03eaf42      NaN      NaN    0.060         NaN         NaN           1

The Ids were truncated for readability.

Notice that values in the appName column are column names in the desired DataFrame. Moving values from a column into the column index can be performed in two steps:

  1. Move the values into the index using set_index

  2. Move the index level values into a column index using unstack.

Once that is done, you just need to rename the columns to compress the MultiIndex into a single index level. That was done by re-assignment to df.columns. Note that you may prefer to skip that step, as the MultiIndex provides richer structure which may be useful at later stages if you need to select columns based on appName or count or RT.


The Id column was also placed in the index to "protect" it from getting split up by the unstack operation. The easiest way to see what I mean is to take a look at df.set_index(['appName'], append=True).unstack() in a Python interpreter:

In [144]: df.set_index(['appName'], append=True).unstack()
Out[144]: 
               Id                        _RT               _count          
appName      App1      App2      App3   App1   App2   App3   App1 App2 App3
0        ef9fdeed       NaN       NaN  0.003    NaN    NaN      1  NaN  NaN
1             NaN  ef9fdeed       NaN    NaN  0.026    NaN    NaN    1  NaN
2             NaN       NaN  ef9fdeed    NaN    NaN  0.029    NaN  NaN    1
3             NaN  ef9fdeed       NaN    NaN  0.052    NaN    NaN    2  NaN
4             NaN  ef9fdeed       NaN    NaN  0.070    NaN    NaN    1  NaN
5        ef9fdeed       NaN       NaN  0.124    NaN    NaN      2  NaN  NaN
6             NaN       NaN  f03eaf42    NaN    NaN  0.060    NaN  NaN    1

versus

In [145]: df.set_index(['Id', 'appName'], append=True).unstack()
Out[145]: 
              _RT               _count          
appName      App1   App2   App3   App1 App2 App3
  Id                                            
0 ef9fdeed  0.003    NaN    NaN      1  NaN  NaN
1 ef9fdeed    NaN  0.026    NaN    NaN    1  NaN
2 ef9fdeed    NaN    NaN  0.029    NaN  NaN    1
3 ef9fdeed    NaN  0.052    NaN    NaN    2  NaN
4 ef9fdeed    NaN  0.070    NaN    NaN    1  NaN
5 ef9fdeed  0.124    NaN    NaN      2  NaN  NaN
6 f03eaf42    NaN    NaN  0.060    NaN  NaN    1

Upvotes: 2

Related Questions