deserthiker
deserthiker

Reputation: 67

Histograms in Pandas

Relatively new to python and pandas. I have a dataframe: df with say 2 columns (say, 0 and 1) and n rows. I'd like to plot the histograms of the two time series data represented in the two columns. I also need access to the exact counts in the histogram for each bin for later manipulations.

b_counts, b_bins = np.histogram(df[0], bins = 10)
a_counts, a_bins = np.histogram(df[1], bins = 10)

plt.bar(b_bins, b_counts)
plt.pbar(a_bins, a_counts)

However I get an error for incompatible sizes, i.e, length of the bins array is 11 whereas the length of counts array is 10. Two questions: 1) Why does the histogram in numpy an extra bin? i.e., 11 instead of 10 bins 2) Assuming question 1) above can be solved, is this the best/simplest way of going about this?

Upvotes: 1

Views: 1288

Answers (1)

Kartik
Kartik

Reputation: 8683

I would directly use Pyplot's built in histogram function:

b_counts, b_bins, _ = plt.hist(df[0], bins = 10)
a_counts, a_bins, _ = plt.hist(df[1], bins = 10)

As per the documentation of numpy.histogram (if you scroll down far enough to read the Returns section in parameter definition):

hist : array The values of the histogram. See density and weights for a description of the possible semantics.

bin_edges : array of dtype float Return the bin edges (length(hist)+1).

Quite clear, isn't it?

Upvotes: 2

Related Questions