Reputation: 2007
I would like to get histogram values from a DataFrame
:
%matplotlib inline
import pandas as pd
import numpy as np
df=pd.DataFrame(60*np.random.sample((100, 4)), pd.date_range('1/1/2014',periods=100,freq='D'), ['A','B','C','D'])
Taking into account pd.cut()
it is possible to do it with only one column, as in example:
bins=np.linspace(0,60,5)
df.groupby(pd.cut(df.A,bins)).count()
Is it possible to get whole histogram values for all columns in one DataFrame
?
The desired output would look like this:
A B C D
(0, 15] 21 10 1 2
(15, 30] 14 24 21 24
(30, 45] 10 0 22 30
(45, 60] 25 5 25 25
Upvotes: 2
Views: 2283
Reputation: 3009
How about this technique, essentially list comphrension and a pd.concat()
np.random.seed(1)
bins=np.linspace(0,60,5)
df= pd.concat([df[x].groupby(pd.cut(df[x],bins)).count() for x in df.columns],axis=1)
df.index.names = [None]
print df
which for me produces:
A B C D
(0, 15] 26 20 31 23
(15, 30] 23 23 20 18
(30, 45] 24 32 24 29
(45, 60] 27 25 25 30
Upvotes: 2