Sophia
Sophia

Reputation: 388

Python: Plot histograms with customized bins

I am using matplotlib.pyplot to make a histogram. Due to the distribution of the data, I want manually set up the bins. The details are as follows:

  1. Any value = 0 in one bin;
  2. Any value > 60 in the last bin;
  3. Any value > 0 and <= 60 are in between the bins described above and the bin size is 5.

Could you please give me some help? Thank you.

Upvotes: 0

Views: 2120

Answers (3)

Tranbi
Tranbi

Reputation: 12701

IIUC you want a classic histogram for value between 0 (not included) and 60 (included) and add two bins for 0 and >60 on the side.

In that case I would recommend plotting the 3 regions separately:

import matplotlib.pyplot as plt

data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here

fig, axes = plt.subplots(1,3, sharey=True, width_ratios=[1, 12, 1])
fig.subplots_adjust(wspace=0)

# counting 0 values and drawing a bar between -5 and 0
axes[0].bar(-5, data.count(0), width=5, align='edge')   
axes[0].xaxis.set_visible(False)
axes[0].spines['right'].set_visible(False)
axes[0].set_xlim((-5, 0))

# histogram between (0, 60]
axes[1].hist(data, bins=12, range=(0.0001, 60.0001))
axes[1].yaxis.set_visible(False)
axes[1].spines['left'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].set_xlim((0, 60))

# counting values > 60 and drawing a bar between 60 and 65
axes[2].bar(60, len([x for x in data if x > 60]), width=5, align='edge')
axes[2].xaxis.set_visible(False)
axes[2].yaxis.set_visible(False)
axes[2].spines['left'].set_visible(False)
axes[2].set_xlim((60, 65))

plt.show()

Output:

enter image description here

Edit: If you wanna plot probability density, I would edit the data and simply use hist:

import matplotlib.pyplot as plt

data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -3] # your data here

data2 = []
for el in data:
    if el < 0:
        pass
    elif el > 60:
        data2.append(61)
    else:
        data2.append(el)

plt.hist(data2, bins=14, density=True, range=(-4.99,65.01))
plt.show()

Upvotes: 1

Robert
Robert

Reputation: 11

building off Tranbi's answer, you could specify the bin edges as detailed in the link they shared.

import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -6] # your data here
df = pd.DataFrame()
df['data'] = data

bin_edges = [-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
bin_edges_offset = [x+0.000001 for x in bin_edges]

plt.figure()
plt.hist(df['data'], bins=bin_edges_offset)
plt.show()

histogram

Upvotes: 1

Tranbi
Tranbi

Reputation: 12701

I'm not sure what you mean by "the bin size is 5". You can either plot a histogramm by specifying the bins with a sequence:

import matplotlib.pyplot as plt
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here
plt.hist(data, bins=[0, 0.5, 60, max(data)])
plt.show()

But the bin size will match the corresponding interval, meaning -in this example- that the "0-case" will be barely visible:

enter image description here

(Note that 60 is moved to the last bin when specifying bins as a sequence, changing the sequence to [0, 0.5, 59.5, max(data)] would fix that)

What you (probably) need is first to categorize your data and then plot a bar chart of the categories:

import matplotlib.pyplot as plt
import pandas as pd

data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here

df = pd.DataFrame()
df['data'] = data

def find_cat(x):
    if x == 0:
        return "0"
    elif x > 60:
        return "> 60"
    elif x > 0:
        return "> 0 and <= 60"

df['category'] = df['data'].apply(find_cat)
df.groupby('category', as_index=False).count().plot.bar(x='category', y='data', rot=0, width=0.8)
plt.show()

Output:

enter image description here

Upvotes: 1

Related Questions