Reputation: 895
I have some count-based data I would like to represent as a simple histogram. However, I would also like to group outlying points beyond a certain threshold into an 'overflow' bin. I'm unsure how to do this. Here is some sample data:
nums = np.random.randint(1,10,100)
nums = np.append(nums, [80, 100])
mydata = pd.DataFrame(nums)
mydata.hist(bins=20)
In this case, I'd want to group anything larger than 10 into the same bin. I initially thought of adjusting values beyond this threshold into the same value (i.e., 11), but I assume there is a more Pythonic way of doing this.
Upvotes: 2
Views: 4688
Reputation: 8122
If you don't want or need pandas
in the solution, or want lots of flexibility, e.g. with the x-axis labels, then maybe this is a way to do it:
import numpy as np
import matplotlib.pyplot as plt
nums = np.random.randint(1, 10, 100)
nums = np.append(nums, [80, 100])
bins = [0, 5, 10, 100]
n, _ = np.histogram(mydata, bins=bins)
labels = [f'{a} to {b}' for a, b in zip(bins, bins[1:])]
fig, ax = plt.subplots()
bar = ax.bar(labels, n)
_ = ax.bar_label(bar)
This yields:
Upvotes: 0
Reputation: 433
You can use Pandas .cut() method to make custom bins:
nums = np.random.randint(1,10,100)
nums = np.append(nums, [80, 100])
mydata = pd.DataFrame(nums)
mydata["bins"] = pd.cut(mydata[0], [0,5,10,100])
mydata["bins"].value_counts().plot.bar()
Upvotes: 1