Reputation: 31
I'm trying to do a histogram in Python just like I did in R. How can I do it?
R:
age <- c(43, 23, 56, 34, 38, 37, 41)
hist(age)
Python:
age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age)
Upvotes: 3
Views: 1475
Reputation: 535
The difference here is caused by the way R and matplotlib choose the number of bins by default.
For this particular example you can use:
age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins=4)
to replicate the R-style histogram.
If we want to have matplotlib's histograms look like R's in general, all we need to do is replicate the binning logic that R uses. Internally, R uses Sturges' Formula* to calculate the number of bins. matplotlib supports this out of the box, we just have to pass 'sturges' for the bins argument.
age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins='sturges')
* It's a little bit more complicated internally, but this gets us most of the way there.
Upvotes: 3
Reputation: 339490
In short, use bins="sturges"
in the plt.hist
call.
From numpy.histogram_bin_edges
bins
:
[...]
‘sturges’
R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.
So you will get a histogram similar to R
's via
import matplotlib.pyplot as plt
import numpy as np
age = np.array((43, 23, 56, 34, 38, 37, 41))
plt.hist(age, bins="sturges", facecolor="none", edgecolor="k")
plt.show()
Note however that the edges are still the minimum and maximum of the data. There is no way to automatically change this, but you could the bins manually to be exactly those from the R
diagram via bins=(20,30,40,50,60)
.
Upvotes: 2