Reputation: 21
I have the following code:
data_df = pandas.read_csv(filename, parse_dates = True)
groupings = np.unique(data_df[['Ind']])
for group in groupings:
data_df2 = data_df[data_df['Ind'] == group]
table = pandas.pivot_table(data_df2, values='Rev', index=['Ind', 'Month'], columns=['Type'], aggfunc=sum)
table = table.sort_index(ascending=[0, 0])
print(table)
How can I sort the pivot 'table' by month and year (e.g. when I print 'table' I want Dec-14 to be the first row of output for each group)?
Below is a sample of the data in 'data_df':
Ind Type Month Rev
0 A Voice Dec-14 10.00
1 A Voice Jan-15 8.00
2 A Voice Feb-15 13.00
3 A Voice Mar-15 9.00
4 A Voice Apr-15 11.00
5 A Voice May-15 14.00
6 A Voice Jun-15 6.00
7 A Voice Jul-15 4.00
8 A Voice Aug-15 12.00
9 A Voice Sep-15 7.00
10 A Voice Oct-15 5.00
11 A Elec Dec-14 8.04
12 A Elec Jan-15 6.95
13 A Elec Feb-15 7.58
14 A Elec Mar-15 8.81
15 A Elec Apr-15 8.33
16 A Elec May-15 9.96
17 A Elec Jun-15 7.24
18 A Elec Jul-15 4.26
19 A Elec Aug-15 10.84
20 A Elec Sep-15 4.82
21 A Elec Oct-15 5.68
22 B Voice Dec-14 10.00
23 B Voice Jan-15 8.00
24 B Voice Feb-15 13.00
25 B Voice Mar-15 9.00
26 B Voice Apr-15 11.00
27 B Voice May-15 14.00
28 B Voice Jun-15 6.00
29 B Voice Jul-15 4.00
.. .. ... ... ...
The output is (I was playing with ascending but it only wants to sort alpha):
Type Elec Voice
Ind Month
A Sep-15 4.82 7
Oct-15 5.68 5
May-15 9.96 14
Mar-15 8.81 9
Jun-15 7.24 6
Jul-15 4.26 4
Jan-15 6.95 8
Feb-15 7.58 13
Dec-14 8.04 10
Aug-15 10.84 12
Apr-15 8.33 11
I want the output to be sorted by date:
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
...
Upvotes: 2
Views: 5658
Reputation: 863166
I think you can first convert column Month
to_datetime
and then to_period
:
data_df['Month'] = pd.to_datetime(data_df['Month'], format='%b-%y').dt.to_period('M')
Ind Type Month Rev
0 A Voice 2014-12 10.00
1 A Voice 2015-01 8.00
2 A Voice 2015-02 13.00
3 A Voice 2015-03 9.00
4 A Voice 2015-04 11.00
5 A Voice 2015-05 14.00
6 A Voice 2015-06 6.00
7 A Voice 2015-07 4.00
8 A Voice 2015-08 12.00
9 A Voice 2015-09 7.00
10 A Voice 2015-10 5.00
11 A Elec 2014-12 8.04
12 A Elec 2015-01 6.95
13 A Elec 2015-02 7.58
14 A Elec 2015-03 8.81
15 A Elec 2015-04 8.33
16 A Elec 2015-05 9.96
17 A Elec 2015-06 7.24
18 A Elec 2015-07 4.26
19 A Elec 2015-08 10.84
20 A Elec 2015-09 4.82
21 A Elec 2015-10 5.68
22 B Voice 2014-12 10.00
23 B Voice 2015-01 8.00
24 B Voice 2015-02 13.00
25 B Voice 2015-03 9.00
26 B Voice 2015-04 11.00
27 B Voice 2015-05 14.00
28 B Voice 2015-06 6.00
29 B Voice 2015-07 4.00
Then use pivot_table
, sorting is not necessary:
data_df = pd.pivot_table(data_df, values='Rev', index=['Ind', 'Month'], columns='Type', aggfunc=sum)
print data_df
Type Elec Voice
Ind Month
A 2014-12 8.04 10
2015-01 6.95 8
2015-02 7.58 13
2015-03 8.81 9
2015-04 8.33 11
2015-05 9.96 14
2015-06 7.24 6
2015-07 4.26 4
2015-08 10.84 12
2015-09 4.82 7
2015-10 5.68 5
B 2014-12 NaN 10
2015-01 NaN 8
2015-02 NaN 13
2015-03 NaN 9
2015-04 NaN 11
2015-05 NaN 14
2015-06 NaN 6
2015-07 NaN 4
Last you can change Datetimeindex
in Multiindex
by strftime
new_index = zip(data_df.index.get_level_values('Ind'),data_df.index.get_level_values('Month').strftime('%b-%y'))
data_df.index = pd.MultiIndex.from_tuples(new_index, names = data_df.index.names)
print data_df
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
Mar-15 8.81 9
Apr-15 8.33 11
May-15 9.96 14
Jun-15 7.24 6
Jul-15 4.26 4
Aug-15 10.84 12
Sep-15 4.82 7
Oct-15 5.68 5
B Dec-14 NaN 10
Jan-15 NaN 8
Feb-15 NaN 13
Mar-15 NaN 9
Apr-15 NaN 11
May-15 NaN 14
Jun-15 NaN 6
Jul-15 NaN 4
Or you can use reset_index
, dt.strftime
and set_index
:
data_df = data_df.reset_index(level=1)
data_df['Month'] = data_df['Month'].dt.strftime('%b-%y')
data_df = data_df.set_index('Month', append=True)
print data_df
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
Mar-15 8.81 9
Apr-15 8.33 11
May-15 9.96 14
Jun-15 7.24 6
Jul-15 4.26 4
Aug-15 10.84 12
Sep-15 4.82 7
Oct-15 5.68 5
B Dec-14 NaN 10
Jan-15 NaN 8
Feb-15 NaN 13
Mar-15 NaN 9
Apr-15 NaN 11
May-15 NaN 14
Jun-15 NaN 6
Jul-15 NaN 4
Upvotes: 2
Reputation: 109626
First using the solution of @jezrael to reformat the Month column, you then do this to get your pivot table:
>>> df_data.pivot_table(values='Rev', index=['Ind', 'Month'], columns='Type')
Type Elec Voice
Ind Month
A 2014-12 8.04 10
2015-01 6.95 8
2015-02 7.58 13
2015-03 8.81 9
2015-04 8.33 11
2015-05 9.96 14
2015-06 7.24 6
2015-07 4.26 4
2015-08 10.84 12
2015-09 4.82 7
2015-10 5.68 5
B 2014-12 NaN 10
2015-01 NaN 8
2015-02 NaN 13
2015-03 NaN 9
2015-04 NaN 11
2015-05 NaN 14
2015-06 NaN 6
2015-07 NaN 4
Or by using groupby
with an unstack
:
df.groupby(['Ind', 'Month', 'Type']).Rev.sum().unstack('Type')
Upvotes: 0
Reputation: 210882
You need to convert your 'Month' column to datetime after creating the DataFrame from CSV file:
df['Month'] = pd.to_datetime(df['Month'], format="%b-%y")
Because currently it's a string...
Or you can use the following trick (date_parser
) in order to parse dates, during "read_csv":
from __future__ import print_function
import pandas as pd
dateparser = lambda x: pd.datetime.strptime(x, '%b-%y')
df = pd.read_csv('data.csv', delimiter=r'\s+', parse_dates=['Month'], date_parser=dateparser)
print(df.sort_values(['Month']))
PS i don't what is your preferred output date format...
Upvotes: 1