Josh Kidd
Josh Kidd

Reputation: 870

How to plot cdf on histogram in matplotlib

I currently have a script that will plot a histogram of relative frequency, given a pandas series. The code is:

def to_percent3(y, position):
    s = str(100 * y)
    if matplotlib.rcParams['text.usetex'] is True:
        return s + r'$\%$'
    else:
        return s + '%'

df = pd.read_csv('mycsv.csv')

waypointfreq = df['Waypoint Frequency(Secs)']
cumfreq = df['Waypoint Frequency(Secs)']
perctile = np.percentile(waypointfreq, 95) # claculates 95th percentile
bins = np.arange(0,perctile+1,1)  # creates list increasing by 1 to 96th percentile 
plt.hist(waypointfreq, bins = bins, normed=True)
formatter = FuncFormatter(to_percent3)  #changes y axis to percent
plt.gca().yaxis.set_major_formatter(formatter)
plt.axis([0, perctile, 0, 0.03])  #Defines the axis' by the 95th percentile and 10%Relative frequency
plt.xlabel('Waypoint Frequency(Secs)')
plt.xticks(np.arange(0, perctile, 15.0))
plt.title('Relative Frequency of Average Waypoint Frequency')
plt.grid(True)
plt.show()

It produces a plot that looks like this:

enter image description here

What I'd like is to overlay this plot with a line showing the cdf, plotted against a secondary axis. I know that I can create the cumulative graph with the command:

waypointfreq = df['Waypoint Frequency(Secs)']
perctile = np.percentile(waypointfreq, 95) # claculates 90th percentile
bins = np.arange(0,perctile+5,1)  # creates list increasing by 2 to 90th percentile 
plt.hist(waypointfreq, bins = bins, normed=True, histtype='stepfilled',cumulative=True)
formatter = FuncFormatter(to_percent3)  #changes y axis to percent
plt.gca().yaxis.set_major_formatter(formatter)
plt.axis([0, perctile, 0, 1])  #Defines the axis' by the 90th percentile and 10%Relative frequency
plt.xlabel('Waypoint Frequency(Secs)')
plt.xticks(np.arange(0, perctile, 15.0))
plt.title('Cumulative Frequency of Average Waypoint Frequency')
plt.grid(True)
plt.savefig(r'output\4 Cumulative Frequency of Waypoint Frequency.png', bbox_inches='tight')
plt.show()

However, this is plotted on a separate graph, instead of over the previous one. Any help or insight would be appreciated.

Upvotes: 2

Views: 9465

Answers (1)

Moritz
Moritz

Reputation: 5408

Maybe this code snippet helps:

import numpy as np
from scipy.integrate import cumtrapz
from scipy.stats import norm
from matplotlib import pyplot as plt

n = 1000
x = np.linspace(-3,3, n)
data = norm.rvs(size=n)
data = data + abs(min(data))
data = np.sort(data)

cdf = cumtrapz(x=x, y=data ) 
cdf = cdf / max(cdf)

fig, ax = plt.subplots(ncols=1)
ax1 = ax.twinx()
ax.hist(data, normed=True, histtype='stepfilled', alpha=0.2)
ax1.plot(data[1:],cdf)

If your CDF is not smooth, you could fit a distribution

enter image description here

Upvotes: 5

Related Questions