letsplay
letsplay

Reputation: 31

How can I make a histogram using Python just like using R hist function

I'm trying to do a histogram in Python just like I did in R. How can I do it?

R:

age <- c(43, 23, 56, 34, 38, 37, 41)
hist(age)

R output

Python:

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age)

matplotlib output

Upvotes: 3

Views: 1475

Answers (2)

Ammar Askar
Ammar Askar

Reputation: 535

The difference here is caused by the way R and matplotlib choose the number of bins by default.

For this particular example you can use:

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins=4)

to replicate the R-style histogram.

General Case

If we want to have matplotlib's histograms look like R's in general, all we need to do is replicate the binning logic that R uses. Internally, R uses Sturges' Formula* to calculate the number of bins. matplotlib supports this out of the box, we just have to pass 'sturges' for the bins argument.

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins='sturges')

* It's a little bit more complicated internally, but this gets us most of the way there.

Upvotes: 3

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339490

In short, use bins="sturges" in the plt.hist call.


From numpy.histogram_bin_edges

bins:
[...]
‘sturges’ R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.

So you will get a histogram similar to R's via

import matplotlib.pyplot as plt
import numpy as np

age = np.array((43, 23, 56, 34, 38, 37, 41))

plt.hist(age, bins="sturges", facecolor="none", edgecolor="k")

plt.show()

enter image description here

Note however that the edges are still the minimum and maximum of the data. There is no way to automatically change this, but you could the bins manually to be exactly those from the R diagram via bins=(20,30,40,50,60).

Upvotes: 2

Related Questions