marillion
marillion

Reputation: 11200

Pandas: bar plot with multiIndex dataframe

I have a pandas DataFrame with a TIMESTAMP column (not the index), and the timestamp format is as follows:

2015-03-31 22:56:45.510

I also have columns called CLASS and AXLES. I would like to compute the count of records for each month separately for each unique value of AXLES (AXLES can take an integer value between 3-12).

I came up with a combination of resample and groupby:

resamp = dfWIM.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS

This seems to give me a multiIndex dataframe object, as shown below.

In [72]: resamp

Out [72]:

AXLES  TIMESTAMP 
3      2014-07-31      5517
       2014-08-31     31553
       2014-09-30     42816
       2014-10-31     49308
       2014-11-30     44168
       2014-12-31     45518
       2015-01-31     54782
       2015-02-28     52166
       2015-03-31     47929
4      2014-07-31      3147
       2014-08-31     24810
       2014-09-30     39075
       2014-10-31     46857
       2014-11-30     42651
       2014-12-31     48282
       2015-01-31     42708
       2015-02-28     43904
       2015-03-31     50033

From here, how can I access different components of this multiIndex object to create a bar plot for the following conditions?

Thanks!

EDIT: Following code gives me the plot, but I could not change the xtick formatting to MM-YY.

resamp[3].plot(kind='bar')

enter image description here

EDIT 2 below is a code snippet that generates a small sample of the data similar to what I have:

dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
resamp[3].plot(kind='bar')

EDIT 3: Here below is the solution:

A.Plot the whole resampled dataframe (based on @Ako 's suggestion):

df = resamp.unstack(0)
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)

enter image description here

B.Plot an individual index from the resampled dataframe (based on @Alexander 's suggestion):

df = resamp[3]
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)

enter image description here

Upvotes: 6

Views: 6170

Answers (2)

unutbu
unutbu

Reputation: 880797

You could generate and set the labels explicitly using ax.xaxis.set_major_formatter with a ticker.FixedFormatter. This will allow you to keep your DataFrame's MultiIndex with timestamp values, while displaying the timestamps in the desired %m-%Y format:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker

dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS

ax = resamp[3].plot(kind='bar')
ticklabels = [timestamp.strftime('%m-%Y') for axle, timestamp in resamp.index]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: ticklabels[int(x)]))
plt.gcf().autofmt_xdate()

plt.show()

yields enter image description here

Upvotes: 3

Alexander
Alexander

Reputation: 109726

The following should work, but it is difficult to test without some data.

Start by resetting your index to get access to the TIMESTAMP column. Then use strftime to format it to your desired text representation (e.g. mm-yy). Finally, reset the index back to AXLES and TIMESTAMP.

df = resamp.reset_index()
df['TIMESTAMP'] = [ts.strftime('%m-%y') for ts in df.TIMESTAMP]
df.set_index(['AXLES', 'TIMESTAMP'], inplace=True)
>>> df.xs(3, level=0).plot(kind='bar')

enter image description here

Upvotes: 1

Related Questions