Reputation: 115
I have a one dimensional dataset from which I have calculated the kernel density estimate, using statsmodels and also with scipy, in python.
I want to generate a set of random numbers to see if they give me the same distribution or not.
The solution I could find so far was this :
To draw a number x between 0 and 1 and return cdf^{-1}(x), where cdf^{-1} is the inverse cumulative distribution function of 'f'.
Problem with this is, using statsmodel I am able to find the inverse cumulative distribution, but in the form of a matrix. So multiplying a number x with the matrix is not giving me a random sample from the distribution of the calculated KDE.
How can I generate a random number from the given KDE function or distribution?
EDIT: Here is a code sample:
def calcKDE(data):
#Calculating KDE
kde = sm.nonparametric.KDEUnivariate(data)
kde.fit(kernel='gau', bw='silverman')
return kde
def generateData(data, kde):
inverse_cdf = kde.icdf // this is a method which takes no parameters, and so is kde.cdf
randomSet = np.random.random(1) * inverse_cdf // inverse_cdf is taken as a matrix, will also add a loop here to return 1000 random values
return randomSet
Upvotes: 3
Views: 3326
Reputation: 33532
The inverse-transform-sampling approach looks like this:
from scipy.optimize import brentq
import statsmodels.api as sm
import numpy as np
# fit
kde = sm.nonparametric.KDEMultivariate() # ... you already did this
# sample
u = np.random.random()
# 1-d root-finding
def func(x):
return kde.cdf([x]) - u
sample_x = brentq(func, -99999999, 99999999) # read brentq-docs about these constants
# constants need to be sign-changing for the function
I recommend implementing this class-based, with a customized sample-function for better usage.
This approach is also very generic and there are much much faster approaches using additional information about the kernels in use (no inverse-transform sampling). You should find some examples by googling.
Another remark:
sample-function
out of the boxMy ordering of kde-tools from best to worst (my opinion):
Upvotes: 6