Reputation: 13
I have a dataset consisting of a single variable which happens to be left censored (censoring point is 0). I believe that the latent variable (i.e. the variable before censoring takes place) more or less follows a normal distribution. How can I - using R - find the parameters of this distribution?
Given the wealth of R-packages I'm surprised I have not been able to find any that easily solves the problem at hand. Judging by the name the fitdistcens-function from the fitdistrplus-package might be useful in this context. But if I read the documentation correctly - which I doubt - the function requires two columns, one of which should contain the uncensored data:
censdata A dataframe of two columns respectively named left and right, describing each observed value as an interval. The left column contains either NA for left censored observations, the left bound of the interval for interval censored observations, or the observed value for non-censored observations. The right column contains either NA for right censored observations, the right bound of the interval for interval censored observations, or the observed value for non-censored observations.
Does this mean that the function can't be used for my purpose? If so what are the alternatives?
Help (possibly involving an example) is much appreciated.
Upvotes: 1
Views: 2212
Reputation: 739
I ran into the same thing and I believe you just have to specify the minimum value on the right side and NA on the left side when an observation has been censored.
For example, here's a simulation showing that much:
library(fitdistrplus)
# These are the actual parameters of the right-censored, normal distribution to be fit
mu <- 10; sd <- 10;
# Number of samples to use
n <- 100000;
# Minimum value to censor at
x.min <- 0
# Create the sample to fit
x <- rnorm(n, mu, sd)
d <- data.frame(
left= ifelse(x <= x.min, NA, x), # Assign left side as NA when censored
right=ifelse(x <= x.min, x.min, x) # Assign right side as x.min when censored
)
fitdistcens(d, 'norm')
# Fitting of the distribution ' norm ' on censored data by maximum likelihood
# Parameters:
# estimate
# mean 10.01450
# sd 10.00456
Upvotes: 0
Reputation: 13
I wrote up chunk of R-code that seems to do the job:
#Set censoring point
a<-0
#Generate random censored data
x<-rnorm(1000,-1,2)
x[x<a]<-a
length(which(x>a))
#Log-likelihood function
ll<-function(u=0,s=1){
-sum(pnorm(a,mean=u,sd=s,log=TRUE)*(x==a)+dnorm(x,mean=u,sd=s,log=TRUE)*(x>a))
}
#Run MLE - change initiation values (u and s)
require("stats4")
est<-mle(ll,start=list(u=0,s=1),,method="L-BFGS-B",lower = c(-Inf, 0))
#Show fit
plot.ecdf(x)
xplot<-seq(from=a,to=max(x),length=100)
lines(xplot,pnorm(xplot,mean=est@coef[1],sd=est@coef[2]),col="green")
Upvotes: 0