Reputation: 21

Python Pandas Pivot Table Sort by Date

I have the following code:

data_df  = pandas.read_csv(filename, parse_dates = True)
groupings = np.unique(data_df[['Ind']])
for group in groupings:
  data_df2 = data_df[data_df['Ind'] == group]
  table    = pandas.pivot_table(data_df2, values='Rev', index=['Ind', 'Month'], columns=['Type'], aggfunc=sum)  
  table    = table.sort_index(ascending=[0, 0])
  print(table)

How can I sort the pivot 'table' by month and year (e.g. when I print 'table' I want Dec-14 to be the first row of output for each group)?

Below is a sample of the data in 'data_df':

   Ind   Type   Month    Rev
0    A  Voice  Dec-14  10.00
1    A  Voice  Jan-15   8.00
2    A  Voice  Feb-15  13.00
3    A  Voice  Mar-15   9.00
4    A  Voice  Apr-15  11.00
5    A  Voice  May-15  14.00
6    A  Voice  Jun-15   6.00
7    A  Voice  Jul-15   4.00
8    A  Voice  Aug-15  12.00
9    A  Voice  Sep-15   7.00
10   A  Voice  Oct-15   5.00
11   A   Elec  Dec-14   8.04
12   A   Elec  Jan-15   6.95
13   A   Elec  Feb-15   7.58
14   A   Elec  Mar-15   8.81
15   A   Elec  Apr-15   8.33
16   A   Elec  May-15   9.96
17   A   Elec  Jun-15   7.24
18   A   Elec  Jul-15   4.26
19   A   Elec  Aug-15  10.84
20   A   Elec  Sep-15   4.82
21   A   Elec  Oct-15   5.68
22   B  Voice  Dec-14  10.00
23   B  Voice  Jan-15   8.00
24   B  Voice  Feb-15  13.00
25   B  Voice  Mar-15   9.00
26   B  Voice  Apr-15  11.00
27   B  Voice  May-15  14.00
28   B  Voice  Jun-15   6.00
29   B  Voice  Jul-15   4.00
..  ..    ...     ...    ...

The output is (I was playing with ascending but it only wants to sort alpha):

Type        Elec  Voice
Ind Month               
A   Sep-15   4.82      7
    Oct-15   5.68      5
    May-15   9.96     14
    Mar-15   8.81      9
    Jun-15   7.24      6
    Jul-15   4.26      4
    Jan-15   6.95      8
    Feb-15   7.58     13
    Dec-14   8.04     10
    Aug-15  10.84     12
    Apr-15   8.33     11

I want the output to be sorted by date:

Type        Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    ...

Upvotes: 2

Answers (3)

jezrael

Reputation: 863166

I think you can first convert column Month to_datetime and then to_period:

data_df['Month'] = pd.to_datetime(data_df['Month'], format='%b-%y').dt.to_period('M')

   Ind   Type   Month    Rev
0    A  Voice 2014-12  10.00
1    A  Voice 2015-01   8.00
2    A  Voice 2015-02  13.00
3    A  Voice 2015-03   9.00
4    A  Voice 2015-04  11.00
5    A  Voice 2015-05  14.00
6    A  Voice 2015-06   6.00
7    A  Voice 2015-07   4.00
8    A  Voice 2015-08  12.00
9    A  Voice 2015-09   7.00
10   A  Voice 2015-10   5.00
11   A   Elec 2014-12   8.04
12   A   Elec 2015-01   6.95
13   A   Elec 2015-02   7.58
14   A   Elec 2015-03   8.81
15   A   Elec 2015-04   8.33
16   A   Elec 2015-05   9.96
17   A   Elec 2015-06   7.24
18   A   Elec 2015-07   4.26
19   A   Elec 2015-08  10.84
20   A   Elec 2015-09   4.82
21   A   Elec 2015-10   5.68
22   B  Voice 2014-12  10.00
23   B  Voice 2015-01   8.00
24   B  Voice 2015-02  13.00
25   B  Voice 2015-03   9.00
26   B  Voice 2015-04  11.00
27   B  Voice 2015-05  14.00
28   B  Voice 2015-06   6.00
29   B  Voice 2015-07   4.00

Then use pivot_table, sorting is not necessary:

data_df =  pd.pivot_table(data_df, values='Rev', index=['Ind', 'Month'], columns='Type', aggfunc=sum)  
print data_df
Type          Elec  Voice
Ind Month                
A   2014-12   8.04     10
    2015-01   6.95      8
    2015-02   7.58     13
    2015-03   8.81      9
    2015-04   8.33     11
    2015-05   9.96     14
    2015-06   7.24      6
    2015-07   4.26      4
    2015-08  10.84     12
    2015-09   4.82      7
    2015-10   5.68      5
B   2014-12    NaN     10
    2015-01    NaN      8
    2015-02    NaN     13
    2015-03    NaN      9
    2015-04    NaN     11
    2015-05    NaN     14
    2015-06    NaN      6
    2015-07    NaN      4

Last you can change Datetimeindex in Multiindex by strftime

new_index = zip(data_df.index.get_level_values('Ind'),data_df.index.get_level_values('Month').strftime('%b-%y'))
data_df.index = pd.MultiIndex.from_tuples(new_index, names = data_df.index.names)
print data_df
Type         Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    Mar-15   8.81      9
    Apr-15   8.33     11
    May-15   9.96     14
    Jun-15   7.24      6
    Jul-15   4.26      4
    Aug-15  10.84     12
    Sep-15   4.82      7
    Oct-15   5.68      5
B   Dec-14    NaN     10
    Jan-15    NaN      8
    Feb-15    NaN     13
    Mar-15    NaN      9
    Apr-15    NaN     11
    May-15    NaN     14
    Jun-15    NaN      6
    Jul-15    NaN      4

Or you can use reset_index, dt.strftime and set_index:

data_df = data_df.reset_index(level=1)
data_df['Month'] = data_df['Month'].dt.strftime('%b-%y')
data_df = data_df.set_index('Month', append=True)
print data_df
Type         Elec  Voice
Ind Month               
A   Dec-14   8.04     10
    Jan-15   6.95      8
    Feb-15   7.58     13
    Mar-15   8.81      9
    Apr-15   8.33     11
    May-15   9.96     14
    Jun-15   7.24      6
    Jul-15   4.26      4
    Aug-15  10.84     12
    Sep-15   4.82      7
    Oct-15   5.68      5
B   Dec-14    NaN     10
    Jan-15    NaN      8
    Feb-15    NaN     13
    Mar-15    NaN      9
    Apr-15    NaN     11
    May-15    NaN     14
    Jun-15    NaN      6
    Jul-15    NaN      4

Upvotes: 2

Alexander

Reputation: 109626

First using the solution of @jezrael to reformat the Month column, you then do this to get your pivot table:

>>> df_data.pivot_table(values='Rev', index=['Ind', 'Month'], columns='Type')
Type          Elec  Voice
Ind Month                
A   2014-12   8.04     10
    2015-01   6.95      8
    2015-02   7.58     13
    2015-03   8.81      9
    2015-04   8.33     11
    2015-05   9.96     14
    2015-06   7.24      6
    2015-07   4.26      4
    2015-08  10.84     12
    2015-09   4.82      7
    2015-10   5.68      5
B   2014-12    NaN     10
    2015-01    NaN      8
    2015-02    NaN     13
    2015-03    NaN      9
    2015-04    NaN     11
    2015-05    NaN     14
    2015-06    NaN      6
    2015-07    NaN      4

Or by using groupby with an unstack:

df.groupby(['Ind', 'Month', 'Type']).Rev.sum().unstack('Type')

Upvotes: 0

MaxU - stand with Ukraine

Reputation: 210882

You need to convert your 'Month' column to datetime after creating the DataFrame from CSV file:

df['Month'] = pd.to_datetime(df['Month'], format="%b-%y")

Because currently it's a string...

Or you can use the following trick (date_parser) in order to parse dates, during "read_csv":

from __future__ import print_function

import pandas as pd

dateparser = lambda x: pd.datetime.strptime(x, '%b-%y')

df = pd.read_csv('data.csv', delimiter=r'\s+', parse_dates=['Month'], date_parser=dateparser)

print(df.sort_values(['Month']))

PS i don't what is your preferred output date format...

Upvotes: 1

Python Pandas Pivot Table Sort by Date

Answers (3)

Related Questions