Aman Singh
Aman Singh

Reputation: 1241

Pivoting pandas with removal of some headers and renaming of some indexes

I have the following pandas dataframe:

    count  event      date
0    1544  'strike'   2016-11-01
1     226  'defense'  2016-11-01
2    1524  'strike'   2016-12-01
3     246  'defense'  2016-12-01
4    1592  'strike'   2017-01-01
5     245  'defense'  2017-01-01

I want to pivot/transform it in such a way the final output looks like this:

event     2016-11-01 2016-12-01 2017-01-01 2017-02-01 2017-03-01                                                                     
'strike'        1544       1524       1592       1608       1654   
'defense'        226        246        245        210        254

but what i'm getting now upon pivoting is this:

                        count          count            count           count             count\
date          2016-11-01 2016-12-01 2017-01-01 2017-02-01 2017-03-01   
event                                                                  
'strike'            1544       1524       1592       1608       1654   
'defense'            226        246        245        210        254   

is there any way i could remove the entire empty row ahead of the event index-name and rename the date index-name with event as its index-name and also remove the unwanted count appearing in the first row of the data frame? The data seems to be transforming correctly i just want to get rid of these headers and indexes and have the renamed and removed properly. I also don't want the row labels in the desired output.

This is what i've been trying till now:

output = df.pivot(index='event', columns='date')
print(output)

Upvotes: 4

Views: 13485

Answers (2)

Nolatar
Nolatar

Reputation: 31

I would recommend using the more general version of pd.pivot(), which is pd.pivot_table(), like so:

x = pd.pivot_table(df, index = 'event', columns = 'date', values = 'count')

You will get:

date    01/01/2017  01/11/2016  01/12/2016
event           
'defense'   245         226        246
'strike'    1592        1544       1524

Next, you can get rid of the 'date' string by setting:

x.columns.name = ' ' 

Additionally, if you want to change the order of the events, you might want to set the variable up as a categorical variable, before doing the pivoting:

df.event = df.event.astype('category') # cast to categorical
df.event.cat.set_categories(your_list, inplace = True) # force order

where your_list is the list of your categories, in order.

Hope this helps.

Upvotes: 3

jezrael
jezrael

Reputation: 862661

Solution is add parameter values to pivot, then add reset_index for column from index and rename_axis fro remove column name:

output=df.pivot(index='event',columns='date',values='count').reset_index().rename_axis(None,1)
print(output)
       event  2016-11-01  2016-12-01  2017-01-01
0  'defense'         226         246         245
1   'strike'        1544        1524        1592

What happens if omit it?

print (df)
   count      event        date  count1
0   1544   'strike'  2016-11-01       1
1    226  'defense'  2016-11-01       7
2   1524   'strike'  2016-12-01       8
3    246  'defense'  2016-12-01       3
4   1592   'strike'  2017-01-01       0
5    245  'defense'  2017-01-01       1

pivot use each not used column and create MultiIndex for distinguish original columns:

output = df.pivot(index='event', columns='date')
print(output)
               count                           count1                      
date      2016-11-01 2016-12-01 2017-01-01 2016-11-01 2016-12-01 2017-01-01
event                                                                      
'defense'        226        246        245          7          3          1
'strike'        1544       1524       1592          1          8          0

Upvotes: 5

Related Questions