Reputation: 67
Relatively new to python and pandas. I have a dataframe: df
with say 2 columns (say, 0
and 1
) and n rows. I'd like to plot the histograms of the two time series data represented in the two columns. I also need access to the exact counts in the histogram for each bin for later manipulations.
b_counts, b_bins = np.histogram(df[0], bins = 10)
a_counts, a_bins = np.histogram(df[1], bins = 10)
plt.bar(b_bins, b_counts)
plt.pbar(a_bins, a_counts)
However I get an error for incompatible sizes, i.e, length of the bins array is 11 whereas the length of counts array is 10. Two questions: 1) Why does the histogram in numpy an extra bin? i.e., 11 instead of 10 bins 2) Assuming question 1) above can be solved, is this the best/simplest way of going about this?
Upvotes: 1
Views: 1288
Reputation: 8683
I would directly use Pyplot's built in histogram function:
b_counts, b_bins, _ = plt.hist(df[0], bins = 10)
a_counts, a_bins, _ = plt.hist(df[1], bins = 10)
As per the documentation of numpy.histogram (if you scroll down far enough to read the Returns
section in parameter definition):
hist : array The values of the histogram. See density and weights for a description of the possible semantics.
bin_edges : array of dtype float Return the bin edges
(length(hist)+1)
.
Quite clear, isn't it?
Upvotes: 2