etruhart314
etruhart314

Reputation: 1

Normalizing to bin height with matplotlib

I have a set of histograms, each one using a single column of a pandas dataframe and the matplotlib.pyplot.hist function. However, each set of data is a different length, so I want to normalize each histogram; using the built in density option does not make sense for my data, so I want to divide each bin height by the maximum bin height.

Overall I want to know how to 1- extract the bin heights from the histogram made by plt.hist 2- divide all the bin heights by the maximum (got confused by datatypes here, I think Im trying to divide two tuples?) 3- plot a new histogram with the normalized bin heights.

Ideally I want to do this in an order where I can tweak my choice of bin number in the original plot and then re-run to update both the original and normalized plot.

I tried naming what the plt.hist function returns and then dividing by the max, but the only version of this that did not throw an error gave me a plot that made no sense (I think I divided the values Im binning instead of the bin heights, I also don't really understand what n, bins, and patches are)

(n, bins, patches) = plt.hist(df['values'], bins=50)

plt.hist(df['values']/max(n), bins 50) 

Upvotes: 0

Views: 97

Answers (1)

JohanC
JohanC

Reputation: 80459

plt.hist() has 3 return values:

  • the counts of each bar (this is what is shown by default)
  • the edges between the bars (there are 51 edges for 50 bars)
  • the graphical elements that form the bars (rectangular patches)

To use the return values again, you need to create a bar plot, not a histogram. A new histogram would bin the 50 counts again into new counts.

import matplotlib.pyplot as plt
import numpy as np

plt.figure()
values = np.random.randn(10000).cumsum()
counts, bin_edges, _bars = plt.hist(values, bins=50)
plt.xlabel('Values')
plt.ylabel('Counts')
plt.show()

plt.hist()

plt.figure()
plt.bar(bin_edges[:-1], counts / counts.max(), width=np.diff(bin_edges), align='edge')
plt.xlabel('Values')
plt.ylabel('Percentage vs highest bar')
plt.show()

histogram normalized via highest bar

Drawing the original histogram can be skipped by calling np.histogram() instead. It has the same return values, except for the graphical elements. Here is how a standalone code example could look like, with the y-axis formatted as percentages:

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np

values = np.random.randn(10000).cumsum()
counts, bin_edges = np.histogram(values, bins=50)
plt.bar(bin_edges[:-1], counts / counts.max(), width=np.diff(bin_edges), align='edge')
plt.xlabel('Values')
plt.ylabel('Percentage vs highest bar')
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

normalized histogram with percentages

Upvotes: 0

Related Questions