lyschoening
lyschoening

Reputation: 18728

Lognormal distributed variable, find likelihood

Using scipy, I'd like to get a measure of how likely it is that a random variable was generated by my log-normal distribution.

To do this I've considered looking at how far it is from the maximum of the PDF.

My approach so far is this: If the variable is r = 1.5, and the distribution σ=0.5, find the value from the PDF, lognorm.pdf(r, 0.5, loc=0). Given the result, (0.38286..), I would then like to look up what area of the PDF is below 0.38286...

How can this last step be implemented? Is this even the right way to approach this problem?

To give a more general example of the problem. Say someone tells me they have 126 followers on twitter. I know that Twitter followers are a log-normal distribution, and I have the PDF of that distribution. Given that distribution do I determine how believable this number of followers is?

Upvotes: 1

Views: 420

Answers (2)

Josef
Josef

Reputation: 22897

same result as hayden's

For statistical tests with an asymmetric distribution, we get the pvalue by taking the minimum of the two tail probabilities

>>> r = 1.5
>>> 0.5 - abs(lognorm.cdf(r, 0.5, loc=0) - 0.5) 
0.20870287338447135
>>> min((lognorm.cdf(r, 0.5), lognorm.sf(r, 0.5)))
0.20870287338447135

This is usually doubled to get the two-sided p-value, but there are some recent papers that suggest alternatives to the doubling.

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375377

The area under the PDF is the CDF (which is conveniently a method in lognorm):

lognorm.cdf(r, 0.5, loc=0)

.

One thing you can use this to calculate is the Folded Cumulative Distribution (mentioned here), also known as a "mountain plot":

FCD = 0.5 - abs(lognorm.cdf(r, 0.5, loc=0) - 0.5)

Upvotes: 2

Related Questions