The Great
The Great

Reputation: 7733

How to use agg_func = 'All' in Python Pivot_table

I have an input dataframe like given below

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})

It looks like as shown below

enter image description here

I would like to convert my output to look like as shown below

enter image description here

This is what I tried based on response from SO forum

g = df.level_1.str[-2:]
df.pivot_table(index='person_id', columns=g, values='val3',aggfunc='first')

This provides only the first record of each group (or person) like shown below which is very close to my expected output.Basically I am trying to do sort of transformation/rearrange the way input data is shown

enter image description here

but I want to have all the records. In real data, I have more than 100k records and each person can be repeated multiple times.

When I use g.groupby(g).cumcount() as index, I get NA's for half of the records and I would like to avoid it. I don't know why this issue happens.

Any other way to approach this is also helpful. Can you help me get this output?

Upvotes: 2

Views: 191

Answers (1)

Clem G.
Clem G.

Reputation: 396

You're pretty close to have what you want.

The trick here is to add another index to your second level like this :

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})
g = df.level_1.str[-2:]
# Extracting level's number
df['lvl'] = df.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x))))
# Then you pivot with person_id and lvl
df = df.pivot_table(index=['person_id', 'lvl'], columns=g, values='val3', aggfunc='first')

Output should be :

            level_1     FR          te          ue
person_id   lvl             
        1   1       Fasting     11/4/2005   1.33
        2   1           Random      18/1/2007   4.63
            2           Fasting     18/1/2017   8.63

Then if you reset the level 1 index like this :

df.reset_index(level=1).drop("lvl", axis=1)

The output is :

level_1     FR          te          ue
person_id           
1           Fasting     11/4/2005   1.33
2           Random      18/1/2007   4.63
2           Fasting     18/1/2017   8.63

And there you go !

Upvotes: 3

Related Questions