Niek de Klein
Niek de Klein

Reputation: 8824

How to handle huge difference in values when plotting a histogram?

I have a list (intensityList) with 1354 numbers. they range from 25941.9 to 1639980000.0, so there is a very big difference, and I expect that most points are closer to 1639980000.0 than 25941.9. When I make a histogram out of this

plt.hist(intensityList,20)
plt.title('Amount of features per intensity')
plt.xlabel('intensity')
plt.ylabel('frequency')
plt.show()

it puts almost all data in one bar and messes up the x-axis. It works with a test set (random normal numbers) so I'm pretty sure it has to do with the broad range. How can I deal with a dataset like this?

edit: The data is likely very skewed, the standard deviation is much larger than the mean. (mean = 6501401.54114, standard devaition = 49423145.7749)

Upvotes: 0

Views: 861

Answers (2)

Simon Bergot
Simon Bergot

Reputation: 10582

you can increase the number of bins or keep only the values in a range you find interesting.

intensityList = intensityList[intensityList < maxVal]
intensityList = intensityList[intensityList > minVal]

Upvotes: 1

Niek de Klein
Niek de Klein

Reputation: 8824

Quite obvious answer, shows that it helps when you write a question down.. I logged the values and it's all dandy

Upvotes: 2

Related Questions