Shion
Shion

Reputation: 319

How to plot a histogram to get counts for all unique values?

I have a Pandas column with data unique to .0001

I would like to plot a histogram that has a bar for each unique .0001 of data.

I achieve a lot of granularity by

plt.hist(df['data'], bins=500)

but I would like to see counts for each unique value.

How would I go about doing this? thank you

Upvotes: 0

Views: 4128

Answers (3)

G M
G M

Reputation: 22449

Wouldn't be easier to use a Counter?

from collections import Counter
cntr = Counter(df['data'])
plt.bar(cntr.keys(),cntr.values())

In this way you don't have to specify the bin width a priori.

Upvotes: 1

JohanC
JohanC

Reputation: 80289

As your values are discrete, it is important to set the bin boundaries nicely in-between these values. If the boundaries coincide with the values, strange rounding artifacts can happen. The example below has each value 10 times, but the histogram with the boundaries on top of the values puts the last two values into the same bin:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame({'data': np.repeat(np.arange(0.0005, 0.0030, 0.0001), 10)})

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
ax1.hist(df['data'], bins=np.arange(df['data'].min(), df['data'].max(), 0.0001), ec='w')
ax1.set_title('bin boundaries on top of the values')
ax2.hist(df['data'], bins=np.arange(df['data'].min() - 0.00005, df['data'].max() + 0.0001, 0.0001), ec='w')
ax2.set_title('bin boundaries in-between the values')
plt.show()

example plot

Note that the version with the boundaries at the halves also puts the x-ticks nicely in the center of the bins.

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

Instead of specify the number of bins bins=500, you can specify the bins:

plt.hist(df['data'], bins=np.arange(df['data'].min(), df['data'].max(), 0.0001) )

Upvotes: 0

Related Questions