Reputation: 18728
Using scipy, I'd like to get a measure of how likely it is that a random variable was generated by my log-normal distribution.
To do this I've considered looking at how far it is from the maximum of the PDF.
My approach so far is this: If the variable is r = 1.5
, and the distribution σ=0.5, find the value from the PDF, lognorm.pdf(r, 0.5, loc=0)
. Given the result, (0.38286..
), I would then like to look up what area of the PDF is below 0.38286..
.
How can this last step be implemented? Is this even the right way to approach this problem?
To give a more general example of the problem. Say someone tells me they have 126 followers on twitter. I know that Twitter followers are a log-normal distribution, and I have the PDF of that distribution. Given that distribution do I determine how believable this number of followers is?
Upvotes: 1
Views: 420
Reputation: 22897
same result as hayden's
For statistical tests with an asymmetric distribution, we get the pvalue by taking the minimum of the two tail probabilities
>>> r = 1.5
>>> 0.5 - abs(lognorm.cdf(r, 0.5, loc=0) - 0.5)
0.20870287338447135
>>> min((lognorm.cdf(r, 0.5), lognorm.sf(r, 0.5)))
0.20870287338447135
This is usually doubled to get the two-sided p-value, but there are some recent papers that suggest alternatives to the doubling.
Upvotes: 0
Reputation: 375377
The area under the PDF is the CDF (which is conveniently a method in lognorm):
lognorm.cdf(r, 0.5, loc=0)
.
One thing you can use this to calculate is the Folded Cumulative Distribution (mentioned here), also known as a "mountain plot":
FCD = 0.5 - abs(lognorm.cdf(r, 0.5, loc=0) - 0.5)
Upvotes: 2