AJG519
AJG519

Reputation: 3379

Pivot Pandas Dataframe and calculate 'columns' parameter

Say I have the following dataframe:

import pandas as pd

df = pd.DataFrame()
df['ID'] = [1, 1, 1, 2, 2]
df['Type'] = ['A', 'B', 'Q', 'B', 'R']
df['Status'] = [0, 0, 1, 0, 1]
>>> df
   ID Type  Status
0   1    A       0
1   1    B       0
2   1    Q       1
3   2    B       0
4   2    R       1
>>> 

I want to group this dataframe by 'ID' and reshape it so that I have a "Type" variable and a "Status" variable for each item within the group. See below:

   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN

The number of rows in my output dataframe will depend on the max number of records in any one group of IDs.

I believe that the pivot function is what I want to use here. However, it calls for a "columns" parameter which I believe should be an ID of each item within each group. I have a very clunky way of calculating this, which I appreciate advice on a better way of doing this:

>>> g=df.groupby('ID')
>>> df['IDinGroup']=[item for sublist in [range(1,len(i[1])+1) for i in g] for item in sublist]
>>> df
   ID Type  Status  IDinGroup
0   1    A       0          1
1   1    B       0          2
2   1    Q       1          3
3   2    B       0          1
4   2    R       1          2
>>> 

Then I can loop through the 'Type' and 'Status' variables and pivot each and then merge them back together:

>>> ListOfValues=[]
>>> for ValueCol in ['Type','Status']:
...     f=df.pivot(index='ID',columns='IDinGroup',values=ValueCol)
...     f.columns=[ValueCol+str(Col) for Col in f.columns]
...     f.columns.name=None
...     ListOfValues.append(f)
... 
>>> pd.concat(ListOfValues,1)
   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN
>>> 

Am I taking the correct approach here? And if so, what is a better way to calculate the "columns" parameter for the pivot function? (the id of each item within each group)

Upvotes: 0

Views: 51

Answers (1)

HYRY
HYRY

Reputation: 97301

Try this:

import pandas as pd

df = pd.DataFrame()
df['ID'] = [1, 1, 1, 2, 2]
df['Type'] = ['A', 'B', 'Q', 'B', 'R']
df['Status'] = [0, 0, 1, 0, 1]

g = df.groupby("ID")[["Type","Status"]]
df2 = g.apply(pd.DataFrame.reset_index, drop=True).unstack()

Here is the result:

   Type         Status       
      0  1    2      0  1   2
ID                           
1     A  B    Q      0  0   1
2     B  R  NaN      0  1 NaN

The columns is a MultiIndex, if you want to flat it:

df2.columns = [l0 + str(l1+1) for l0, l1 in df2.columns]

the output:

   Type1 Type2 Type3  Status1  Status2  Status3
ID                                             
1      A     B     Q        0        0        1
2      B     R   NaN        0        1      NaN

Upvotes: 1

Related Questions