cookie1986
cookie1986

Reputation: 895

Plot histogram with overflow bin in Pandas

I have some count-based data I would like to represent as a simple histogram. However, I would also like to group outlying points beyond a certain threshold into an 'overflow' bin. I'm unsure how to do this. Here is some sample data:

nums = np.random.randint(1,10,100)
nums = np.append(nums, [80, 100])

mydata = pd.DataFrame(nums)
mydata.hist(bins=20)

Example plot

In this case, I'd want to group anything larger than 10 into the same bin. I initially thought of adjusting values beyond this threshold into the same value (i.e., 11), but I assume there is a more Pythonic way of doing this.

Upvotes: 2

Views: 4688

Answers (2)

Matt Hall
Matt Hall

Reputation: 8122

If you don't want or need pandas in the solution, or want lots of flexibility, e.g. with the x-axis labels, then maybe this is a way to do it:

import numpy as np
import matplotlib.pyplot as plt

nums = np.random.randint(1, 10, 100)
nums = np.append(nums, [80, 100])

bins = [0, 5, 10, 100]
n, _ = np.histogram(mydata, bins=bins)
labels = [f'{a} to {b}' for a, b in zip(bins, bins[1:])]

fig, ax = plt.subplots()
bar = ax.bar(labels, n)
_ = ax.bar_label(bar)

This yields:

Example bar plot

Upvotes: 0

Henrik Bo
Henrik Bo

Reputation: 433

You can use Pandas .cut() method to make custom bins:

nums = np.random.randint(1,10,100)
nums = np.append(nums, [80, 100])

mydata = pd.DataFrame(nums)

mydata["bins"] = pd.cut(mydata[0], [0,5,10,100])
mydata["bins"].value_counts().plot.bar()

enter image description here

Upvotes: 1

Related Questions