Reputation: 723
I am fitting a beta distribution with beta.fit(W). The values of W do not reach the [0,1] boundaries. My question is the following - do I need to force [0,1] bounds by beta.fit(W,loc = min(W),scale = max(W) - min(W)), or may I assume that as long as the data is within the [0,1] range, the fitting "will be fine"? Obviously, scaling the data should give different values of a and b. Which one is the "correct one"?
This question is related to: https://stats.stackexchange.com/questions/68983/beta-distribution-fitting-in-scipy
Unfortunately, no valid answer on what to do when the data is within the expected range is give...
I tried to fit data generated with known values of a and b and neither technique gave a good fit, although scaling seemed to help a bit.
Thanks
Upvotes: 2
Views: 1938
Reputation: 1461
When not passing the floc
and fscale
parameters, fit
tries to estimate them. If you know that the data are in a specific interval you should make that additional information known to the fit function (by setting the parameters yourself) in order to improve the fit. You can also give initial guesses for α, β and the scale parameters (via the loc
and scale
keyword arguments); SciPy's default guessing function seems to be quite sophisticated, though.
Deriving floc
and fscale
from the limits of the sample set is not a good idea because the beta distribution is zero at the interval boundaries for most values of α and β, which means that you are creating large discrepancies between the data and all possible fits.
Upvotes: 1