Reputation: 389
I'm not a pandas expert, so I'm having difficulty knowing exactly how to accomplish what, I would think, is a very simple DataFrame manipulation. I have a DataFrame like below:
Id appName _RT _count
0 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App1 0.003000 1
1 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.026000 1
2 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App3 0.029000 1
3 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.052000 2
4 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.070000 1
5 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App1 0.124000 2
6 f03eaf42-9080-43c8-a892-f910fa442ee6 App3 0.060000 1
What I'm trying to accomplish is creating new columns for appName for _RT and _count like so:
Id appName App1_RT App1_count App2_RT App2_count App3_RT App3_count
0 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App1 0.003000 1
1 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.026000 1
2 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App3 0.029000 1
3 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.052000 2
4 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App2 0.070000 1
5 ef9fdeed-a9ad-4680-b8e9-9e5e10ae2593 App1 0.124000 2
6 f03eaf42-9080-43c8-a892-f910fa442ee6 App3 0.060000 1
My first thought is to use transpose()
, but I feel like there has to be some other way to do this. Honestly, I'm just kind of lost on the best way to approach this. Any help would be greatly appreciated. Thank you.
Upvotes: 1
Views: 163
Reputation: 879251
df = df.set_index(['Id', 'appName'], append=True).unstack()
df.columns = ['{}{}'.format(name, id) for id, name in df.columns]
yields
App1_RT App2_RT App3_RT App1_count App2_count App3_count
Id
0 ef9fdeed 0.003 NaN NaN 1 NaN NaN
1 ef9fdeed NaN 0.026 NaN NaN 1 NaN
2 ef9fdeed NaN NaN 0.029 NaN NaN 1
3 ef9fdeed NaN 0.052 NaN NaN 2 NaN
4 ef9fdeed NaN 0.070 NaN NaN 1 NaN
5 ef9fdeed 0.124 NaN NaN 2 NaN NaN
6 f03eaf42 NaN NaN 0.060 NaN NaN 1
The Id
s were truncated for readability.
Notice that values in the appName
column are column names in the desired DataFrame.
Moving values from a column into the column index can be performed in two steps:
Move the values into the index using set_index
Move the index level values into a column index using unstack
.
Once that is done, you just need to rename the columns to compress the MultiIndex into a single index level. That was done by re-assignment to df.columns
. Note that you may prefer to skip that step, as the MultiIndex provides richer structure which may be useful at later stages if you need to select columns based on appName
or count
or RT
.
The Id
column was also placed in the index to "protect" it from getting split up by the unstack
operation. The easiest way to see what I mean is to take a look at
df.set_index(['appName'], append=True).unstack()
in a Python interpreter:
In [144]: df.set_index(['appName'], append=True).unstack()
Out[144]:
Id _RT _count
appName App1 App2 App3 App1 App2 App3 App1 App2 App3
0 ef9fdeed NaN NaN 0.003 NaN NaN 1 NaN NaN
1 NaN ef9fdeed NaN NaN 0.026 NaN NaN 1 NaN
2 NaN NaN ef9fdeed NaN NaN 0.029 NaN NaN 1
3 NaN ef9fdeed NaN NaN 0.052 NaN NaN 2 NaN
4 NaN ef9fdeed NaN NaN 0.070 NaN NaN 1 NaN
5 ef9fdeed NaN NaN 0.124 NaN NaN 2 NaN NaN
6 NaN NaN f03eaf42 NaN NaN 0.060 NaN NaN 1
versus
In [145]: df.set_index(['Id', 'appName'], append=True).unstack()
Out[145]:
_RT _count
appName App1 App2 App3 App1 App2 App3
Id
0 ef9fdeed 0.003 NaN NaN 1 NaN NaN
1 ef9fdeed NaN 0.026 NaN NaN 1 NaN
2 ef9fdeed NaN NaN 0.029 NaN NaN 1
3 ef9fdeed NaN 0.052 NaN NaN 2 NaN
4 ef9fdeed NaN 0.070 NaN NaN 1 NaN
5 ef9fdeed 0.124 NaN NaN 2 NaN NaN
6 f03eaf42 NaN NaN 0.060 NaN NaN 1
Upvotes: 2