Reputation: 1
I have a set of histograms, each one using a single column of a pandas dataframe and the matplotlib.pyplot.hist function. However, each set of data is a different length, so I want to normalize each histogram; using the built in density option does not make sense for my data, so I want to divide each bin height by the maximum bin height.
Overall I want to know how to 1- extract the bin heights from the histogram made by plt.hist 2- divide all the bin heights by the maximum (got confused by datatypes here, I think Im trying to divide two tuples?) 3- plot a new histogram with the normalized bin heights.
Ideally I want to do this in an order where I can tweak my choice of bin number in the original plot and then re-run to update both the original and normalized plot.
I tried naming what the plt.hist function returns and then dividing by the max, but the only version of this that did not throw an error gave me a plot that made no sense (I think I divided the values Im binning instead of the bin heights, I also don't really understand what n, bins, and patches are)
(n, bins, patches) = plt.hist(df['values'], bins=50)
plt.hist(df['values']/max(n), bins 50)
Upvotes: 0
Views: 97
Reputation: 80459
plt.hist()
has 3 return values:
To use the return values again, you need to create a bar plot, not a histogram. A new histogram would bin the 50 counts again into new counts.
import matplotlib.pyplot as plt
import numpy as np
plt.figure()
values = np.random.randn(10000).cumsum()
counts, bin_edges, _bars = plt.hist(values, bins=50)
plt.xlabel('Values')
plt.ylabel('Counts')
plt.show()
plt.figure()
plt.bar(bin_edges[:-1], counts / counts.max(), width=np.diff(bin_edges), align='edge')
plt.xlabel('Values')
plt.ylabel('Percentage vs highest bar')
plt.show()
Drawing the original histogram can be skipped by calling np.histogram()
instead. It has the same return values, except for the graphical elements. Here is how a standalone code example could look like, with the y-axis formatted as percentages:
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
values = np.random.randn(10000).cumsum()
counts, bin_edges = np.histogram(values, bins=50)
plt.bar(bin_edges[:-1], counts / counts.max(), width=np.diff(bin_edges), align='edge')
plt.xlabel('Values')
plt.ylabel('Percentage vs highest bar')
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
Upvotes: 0