Hoggie Johnson
Hoggie Johnson

Reputation: 80

Matplotlib histogram from x,y values with datetime months as bins

I have an array of date time objects x and an array of y values corresponding to those datetimes. I'm trying to create a histogram which groups all those y values into the same bin by month. Basically adding all y values which are in the same month and creating a histogram which shows the total values for each month.

This is a simplified version of what my data looks like:

x = np.array(datetime.datetime(2014, 2, 1, 0, 0), datetime.datetime(2014, 2, 13, 0, 0),\n     
datetime.datetime(2014, 3, 4, 0, 0), datetime.datetime(2014, 3, 6, 0, 0))

y = np.array(4,3,2,6)

The end result should be a histogram showing month 2 in 2014 with y value 7 and month 3 in 2014 with y value 8.

The first thing I tried was creating a pandas dataframe out of my two array like so:

frame = pd.DataFrame({'x':x,'y':y})

This worked fine with x mapping to all datetime objects and y to all corresponding values. However after creating this dataframe I'm kind of lost on how to add all the y values by month and create bins out of these months using plt.hist()

Upvotes: 3

Views: 4674

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Do This First

df = pd.DataFrame(dict(y=y), pd.DatetimeIndex(x, name='x'))

df

            y
x            
2014-02-01  4
2014-02-13  3
2014-03-04  2
2014-03-06  6

Option 1

df.resample('M').sum().hist()

Option 2

df.groupby(pd.TimeGrouper('M')).sum().hist()

Or Do This First

df = pd.DataFrame(dict(x=pd.to_datetime(x), y=y))

df

           x  y
0 2014-02-01  4
1 2014-02-13  3
2 2014-03-04  2
3 2014-03-06  6

Option 3

df.resample('M', on='x').sum().hist()

Yields

enter image description here

Upvotes: 3

Ilya V. Schurov
Ilya V. Schurov

Reputation: 8067

First of all, thanks for a well-posed question with an example of your data.

This seems to be what you want:

import pandas as pd
import numpy as np
import datetime
%matplotlib inline

x = np.array([datetime.datetime(2014, 2, 1, 0, 0), 
              datetime.datetime(2014, 2, 13, 0, 0),
              datetime.datetime(2014, 3, 4, 0, 0), 
              datetime.datetime(2014, 3, 6, 0, 0)])

y = np.array([4,3,2,6])

frame = pd.DataFrame({'x':x,'y':y})
(frame.set_index('x'). # use date-time as index
 assign(month=lambda x: x.index.month). # add new column with month
 groupby('month'). # group by that column
 sum(). # find a sum of the only column 'y'
 plot.bar()) # make a barplot

The result

Upvotes: 4

Related Questions