Alexis Solis
Alexis Solis

Reputation: 67

Get degrees of freedom for a Standardized T Distribution with MLE

First of all, I thank you all beforehand for reading this.

I am trying to fit a Standardized T-Student Distribution (i.e. a T-Student with standard deviation = 1) on a series of data; that is: I want to estimate the degrees of freedom via Maximum Likelihood Estimation.

An example of what I need to achieve can be found in the following (simple) Excel file I made: https://www.dropbox.com/s/6wv6egzurxh4zap/Excel%20Implementation%20Example.xlsx?dl=0

Inside the Excel file, I have an image that contains the formula corresponding to the calculation of the loglikelihood function for the Standardized T Student Distribution. The formula was extracted from a Finance book (Elements of Financial Risk Management - by Peter Christoffersen).

So far, I have tried this with R:

copula.data <- read.csv(file.choose(),header = TRUE)
z1 <- copula.data[,1]

library(fitdistrplus)


ft1 = fitdist(z1, "t", method = "mle", start = 10)
df1=ft1$estimate[1]

df1
logLik(ft1)

df1 yields the number: 13.11855278779897

logLike(ft1) yields the number: -3600.2918050056487

However, the Excel file yields degrees of freedom of: 8.2962365022727, and a log-likelihood of: -3588.8879 (which is the right answer).

Note: the .csv file that my code reads is the following: https://www.dropbox.com/s/nnh2jgq4fl6cm12/Data%20for%20T%20Copula.csv?dl=0

Any ideas? Thank you people!

Upvotes: 0

Views: 2480

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226057

The formula from your spreadsheet (with n, x substituted for the df parameter and the data)

=GAMMALN((n+1)/2)-GAMMALN(n/2)-LN(PI())/2-LN(n-2)/2-1/2*(1+n)*LN(1+x^2/(n-2))

or, exponentiating,

Gamma((n+1)/2) / (sqrt((n-2) pi) Gamma(n/2)) (1+x^2/(n-2))^-((n+1)/2)

?dt gives

f(x) = Gamma((n+1)/2) / (sqrt(n pi) Gamma(n/2)) (1 + x^2/n)^-((n+1)/2)

So the difference lies in those n-2 values in two places in the formula. I don't have enough context to see why the author is defining the t distribution in that different way; there may be some good reason ...

Looking at the negative log-likelihood curve directly, it certainly seems as though the fitdistrplus answer is agreeing with the direct calculation. (It would be very surprising if there were a bug in the dt() function, R's distribution functions are very broadly used and thoroughly tested.)

LL <- function(p,data=z1) {
    -sum(dt(data,df=p,log=TRUE))
}
pvec <- seq(6,20,by=0.05)
Lvec <- sapply(pvec,LL)
par(las=1,bty="l")
plot(pvec,Lvec,type="l",
     xlab="df parameter",ylab="negative log-likelihood")
## superimpose fitdistr results ...
abline(v=coef(ft1),lty=2)
abline(h=-logLik(ft1),lty=2)

Unless there's something else you're not telling us about the problem definition, it seems to me that R is getting the right answer. (The mean and sd of the data you gave were not exactly equal to 0 and 1 respectively, but they were close; centering and scaling gave an even larger value for the parameter.)

enter image description here

Upvotes: 2

Related Questions