Reputation: 21343
I would like to draw samples from a probability distribution with CDF 1 - e^(-x^2)
.
Is there a method in python/scipy/etc. to enable you to sample from a probability distribution given only its CDF?
Upvotes: 4
Views: 7676
Reputation: 21
Using interpolation when the analytical form might not be known,
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 3, 512)
cdf = 1 - np.exp(-x**2)
cdf_y_samples = np.random.uniform(cdf[0], cdf[-1], size=int(1e5))
cdf_x_samples = np.interp(cdf_y_samples, cdf, x) # apply inverse CDF
plt.figure(1); plt.clf()
plt.hist(cdf_x_samples, bins=50, density=True)
Upvotes: 0
Reputation: 75
My answer builds off of the answer from @Heike and @Jeff N 's answers (which were both correct, don't get me wrong). However, I had to use a mix of defining both the rv_continuous._ppf
, and rv_continuous._pdf
, to handle larger sample values, and this solution is quite fast also.
For my case, I had to simulate random variables for over 50,000 values. When only rv_continuous._pdf
was defined, the output took very long, and gave a ValueError, encountering a NaN value, and stopping the runner.
When only rv_continuous._ppf
was defined, the line graph of the function had issues plotting, as it surpassed the recursion limit in Python (although it is possible to set a higher recursion limit, it did not fix that issue for me).
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
class dist_gen(stats.rv_continuous):
def __init__(self, seed=233423):
super().__init__(seed=seed)
# Both _pdf and _ppf overriden in scipy.stats.rv_continuous
def _pdf(self, x):
return 0.5*np.exp(-1*(abs(x)))
def _ppf(self, x):
return -1*np.sign(x-0.5)*np.log(1-(2*abs(x-0.5)))
if __name__ == "__main__":
# Number of variables to be simulated
N = 100000
dist = dist_gen()
samples = dist.rvs(size=N)
# Plot histogram
fig, ax = plt.subplots()
ax.hist(samples, bins=200, density=True)
# Plotting the line function
domain = np.linspace(-7.5, 7.5)
ax.set_ylim(0, 0.75)
ax.plot(domain, dist.pdf(domain), color='purple')
# Showing the final plot
plt.show()
Upvotes: 1
Reputation: 65
To add on to the solution by Heike, you could use Inverse Transform Sampling to sample via the CDF:
import math, random
import matplotlib.pyplot as plt
def inverse_cdf(y):
# Computed analytically
return math.sqrt(math.log(-1/(y - 1)))
def sample_distribution():
uniform_random_sample = random.random()
return inverse_cdf(uniform_random_sample)
x = [sample_distribution() for i in range(10000)]
plt.hist(x, bins=50)
plt.show()
I was very curious to see how this worked in SciPy, too. It actually looks like it does something very similar to the above. Based on the SciPy docs:
The default method _rvs relies on the inverse of the cdf, _ppf, applied to a uniform random variate. In order to generate random variates efficiently, either the default _ppf needs to be overwritten (e.g. if the inverse cdf can expressed in an explicit form) or a sampling method needs to be implemented in a custom _rvs method.
And based on the SciPy source code, the _ppf
(i.e., the inverse of the CDF) does in fact look to be approximated numerically if not specified explicitly. Very cool!
Upvotes: 5
Reputation: 24430
To create a custom random variable class given a CDF you could subclass scipy.rv_continuous
and override rv_continuous._cdf
. This will then automatically generate the corresponding PDF and other statistical information about your distribution, e.g.
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
class MyRandomVariableClass(stats.rv_continuous):
def __init__(self, xtol=1e-14, seed=None):
super().__init__(a=0, xtol=xtol, seed=seed)
def _cdf(self, x):
return 1-np.exp(-x**2)
if __name__ == "__main__":
my_rv = MyRandomVariableClass()
# sample distribution
samples = my_rv.rvs(size = 1000)
# plot histogram of samples
fig, ax1 = plt.subplots()
ax1.hist(list(samples), bins=50)
# plot PDF and CDF of distribution
pts = np.linspace(0, 5)
ax2 = ax1.twinx()
ax2.set_ylim(0,1.1)
ax2.plot(pts, my_rv.pdf(pts), color='red')
ax2.plot(pts, my_rv.cdf(pts), color='orange')
fig.tight_layout()
plt.show()
Upvotes: 9