Reputation: 133
It is quite easy to get a good fit of a chi-squared distribution for a limited range:
library(MASS)
nnn <- 1000
set.seed(101)
chii <- rchisq(nnn,4, ncp = 0) ## Generating a chi-sq distribution
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),method="BFGS") ## Fitting
chi_k <- chi_df[[1]][1] ## Degrees of freedom
chi_hist <- hist(chii,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=chi_k),add=TRUE,col="green",lwd=3) ## Plotting the line
However, assume I have a data set where the distribution is spread out over the X-axis, and its new values are instead given by something like:
chii <- 5*rchisq(nnn,4, ncp = 0)
Without knowing this multiplicative factor 5
for a real data set, how do I normalize the rchisq()
/ complex data to get a good fit with fitdistr()
?
Thanks in advance for your help!
Upvotes: 1
Views: 1850
Reputation: 271
You will have to loop across degrees of freedom to find the best fit for your data. First you probably know that the mean of the chi-squared distribution is the degree of freedom, let's use that to adjust your data and solve your problem.
In summary you loop across possible degrees of freedom to find the one that best fits your adjusted data.
library(MASS)
nnn <- 1000
set.seed(101)
x <- round(runif(1,1,100)) # generate a random multiplier
chii <- x*rchisq(nnn,4, ncp = 0) ## Generating a shifted chi-sq distribution
max_df <- 100 # max degree of freedom to test (here from 1 to 100)
chi_df_disp <- rep(NA,max_df)
# loop across degree of freedom
for (i in 1:max_df) {
chii_adjusted <- (chii/mean(chii))*i # Adjust the chi-sq distribution so that the mean matches the tested degree of freedom
chi_fit <- fitdistr(chii_adjusted,"chi-squared",start=list(df=i),method="BFGS") ## Fitting
chi_df_disp[i] <- chi_fit$estimate/i # This is going to give you the dispersion between the fitted df and the tested df
}
# Find the value with the smallest dispersion (i.e. the best match between the estimated df and the tested df)
real_df <- which.min(abs(chi_df_disp-1))
print(real_df) # print the real degree of freedom after correction
Now you can use the "real" degree of freedom to adjust you chi-squared distribution and plot the theoretical distribution line.
chii_adjusted <- (chii/mean(chii))*real_df
chi_hist <- hist(chii_adjusted,breaks=50,freq=FALSE) ## PLotting the histogram
curve(dchisq(x,df=real_df),add=TRUE,col="green",lwd=3) ## Plotting the line
Upvotes: 1