Reputation: 631
I have a Pandas DF (brief snippet below) called df
deathtype height deaths
0 AMS 4900 1
1 AMS 5150 1
2 AMS 5300 1
3 Avalanche 5350 14
4 Avalanche 5600 4
5 Avalanche 5700 1
6 Avalanche 5800 17
7 Unexplained 8500 1
8 Unexplained 8560 1
I'm trying to reshape the data into the following;
deaths 1 4 14 17
deathtype
AMS 4900,5150,5300 0 0 0
Avalanche 5700 5600 5350 5800
Unexplained 8500, 8560 0 0 0
I know that a pivot_table can't achieve this since the aggfunc uses the mean for duplicate values which means that for all the deaths
values of 1, the mean will be recorded. A pivot_table gives me the following;
df.pivot_table(index='deathtype', columns='deaths', values='height', fill_value='0')
deaths 1 4 14 17
deathtype
AMS 5116.666667 0 0 0
Avalanche 5700.000000 5600 5350 5800
Unexplained 8530.000000 0 0 0
I'm looking for some advice on how to go about doing this. It looks like a pivot_table isn't the right approach here. Can someone please provide some pointers.
Upvotes: 1
Views: 545
Reputation: 863651
Use groupby
with aggregate by join
, then reshape by unstack
:
d = lambda x: ', '.join(x.astype(str))
df = df.groupby(['deathtype', 'deaths'])['height'].agg(d).unstack(fill_value='0')
print (df)
deaths 1 4 14 17
deathtype
AMS 4900, 5150, 5300 0 0 0
Avalanche 5700 5600 5350 5800
Unexplained 8500, 8560 0 0 0
Detail:
print (df.groupby(['deathtype', 'deaths'])['height'].agg(lambda x: ', '.join(x.astype(str))))
deathtype deaths
AMS 1 4900, 5150, 5300
Avalanche 1 5700
4 5600
14 5350
17 5800
Unexplained 1 8500, 8560
Name: height, dtype: object
Another solution with pivot_table
:
df = df.pivot_table(index='deathtype',
columns='deaths',
values='height',
fill_value='0',
aggfunc=lambda x: ', '.join(x.astype(str)))
Upvotes: 1