Jusstol
Jusstol

Reputation: 93

Pandas DataFrame : groupby then transpose

I have this DataFrame :

df = pd.DataFrame({'1-sensor':['608', '608', '2158', '2158'],
          '2-day':['2017-12-11', '2017-12-12', '2017-12-11', '2017-12-12'],
          '3-voltage':[30, 31, 28, 29]})

df :

  1-sensor       2-day  3-voltage
0      608  2017-12-11         30
1      608  2017-12-12         31
2     2158  2017-12-11         28
3     2158  2017-12-12         29

And I would like to make it look like this :

pd.DataFrame({'1-sensor':['608', '2158'],
              '2017-12-11':[30, 28],
              '2017-12-12':[31, 29]})

df :

  1-sensor  2017-12-11  2017-12-12
0      608          30          31
1     2158          28          29

I have tried different solutions combining .groupby('sensor') and .unstack() but not successful. Any good suggestions ?

Thanks!

Upvotes: 7

Views: 6967

Answers (2)

cs95
cs95

Reputation: 403128

Option 1
Use pivot, if there are no aggregations. This is actually more efficient -

df.pivot(index='1-sensor', columns='2-day', values='3-voltage')

2-day     2017-12-11  2017-12-12
1-sensor                        
2158              28          29
608               30          31

If there are any aggregations to be performed, use pivot_table and include aggfunc=....


Option 2
Another solution with set_index + unstack -

df.set_index(['1-sensor', '2-day']).unstack()

          3-voltage           
2-day    2017-12-11 2017-12-12
1-sensor                      
2158             28         29
608              30         31

If aggregations are required, then groupby + unstack will do it. You were this close.

df.groupby(['1-sensor', '2-day'], sort=False).sum().unstack()  

          3-voltage           
2-day    2017-12-11 2017-12-12
1-sensor                      
608              30         31
2158             28         29

Replace sum with the aggfunction you want to use, if you need to use one.

Upvotes: 6

Jacob H
Jacob H

Reputation: 607

You can also do it with pd.pivot_table:

pd.pivot_table(df, index = ['1-sensor'], columns = ['2-day'], values = '3-voltage').reset_index()

Upvotes: 2

Related Questions