Hrishabh Nayal
Hrishabh Nayal

Reputation: 125

How to plot a histogram in matplotlib in python?

I know how to plot a histogram when individual datapoints are given like: (33, 45, 54, 33, 21, 29, 15, ...)

by simply using something matplotlib.pyplot.hist(x, bins=10)

but what if I only have grouped data like:

| Marks    |Number of students |
| -------- | ----------------- |
| 0-10    | 8               |
| 10-20  | 12           |
|  20-30       |    24         |
|  30-40       |    26         |
|  ......       | ......            | and so on.

I know that I can use bar plots to mimic a histogram by changing xticks but what if I want to do this by using only hist function of matplotlib.pyplot?

Is it possible to do this?

Upvotes: 0

Views: 7599

Answers (2)

tdy
tdy

Reputation: 41327

You can build the hist() params manually and use the existing value counts as weights.

Say you have this df:

>>> df = pd.DataFrame({'Marks': ['0-10', '10-20', '20-30', '30-40'], 'Number of students': [8, 12, 24, 26]})
   Marks  Number of students
0   0-10                   8
1  10-20                  12
2  20-30                  24
3  30-40                  26

The bins are all the unique boundary values in Marks:

>>> bins = pd.unique(df.Marks.str.split('-', expand=True).astype(int).values.ravel())
array([ 0, 10, 20, 30, 40])

Choose one x value per bin, e.g. the left edge to make it easy:

>>> x = bins[:-1]
array([ 0, 10, 20, 30])

Use the existing value counts (Number of students) as weights:

>>> weights = df['Number of students'].values
array([ 8, 12, 24, 26])

Then plug these into hist():

>>> plt.hist(x=x, bins=bins, weights=weights)

reconstructed histogram

Upvotes: 2

thomas_bssnt
thomas_bssnt

Reputation: 104

One possibility is to “ungroup” data yourself.

For example, for the 8 students with a mark between 0 and 10, you can generate 8 data points of value of 5 (the mean). For the 12 with a mark between 10 and 20, you can generate 12 data points of value 15.

However, the “ungrouped” data will only be an approximation of the real data. Thus, it is probably better to just use a matplotlib.pyplot.bar plot.

Upvotes: 0

Related Questions