소재룡
소재룡

Reputation: 31

How do I know what distribution of data follows in R?

I have the following data frame.

IN <- c(3.5, 5.75, 9, 13.25, 13, 9.5, 9.25, 6.75, 7, 4.25, 3.25, 1.75, 0)
OUT <- c(0.25, 2, 5.25, 8.5, 10.5, 11, 11.75, 9.25, 9.5, 7, 3.75, 4, 3.5)
dat <- data.frame(IN, OUT)
rownames(dat) <- c("10~11", "11~12", "12~13", "13~14", "14~15", "15~16", "16~17", "17~18", "18~19", "19~20", "20~21", "21~22", "22~23")

This data is the average number of people measured in restaurants four days per hour from 10:00 am to 11:00 pm.

I want to know the distribution of IN and OUT data, respectively. How do I know this in R? Otherwise, Is there a good way to analyze this through R?

Upvotes: 0

Views: 3676

Answers (2)

mysteRious
mysteRious

Reputation: 4314

The fitdistrplus package can help with this kind of thing, but you need to know what candidate distributions you want to check. Let's try normal, uniform, and exponential:

library(fitdistrplus)
fit.in1 <- fitdist(dat$IN, "norm")
fit.in2 <- fitdist(dat$IN, "unif")
fit.in3 <- fitdist(dat$IN, "exp")

Then you can plot some diagnostics:

par(mfrow=c(2,2)
denscomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
qqcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
cdfcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
ppcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))

enter image description here

Is it normal? Maybe:

> shapiro.test(dat$IN)

    Shapiro-Wilk normality test

data:  dat$IN
W = 0.96548, p-value = 0.8352

Is it uniform over [0,14]? Maybe

> ks.test(dat$IN,"punif",0,14)

    One-sample Kolmogorov-Smirnov test

data:  dat$IN
D = 0.16758, p-value = 0.8024
alternative hypothesis: two-sided

The null hypotheses for these tests are that the distribution is what you think it is. The alternative is that the distribution is NOT what you are testing against. So the tinier p-values mean that a particular distribution is not a good candidate for fit.

Upvotes: 2

Vishesh Shrivastav
Vishesh Shrivastav

Reputation: 2139

You can use the fitdistrplus package as follows:

library(fitdistrplus)
IN <- c(3.5, 5.75, 9, 13.25, 13, 9.5, 9.25, 6.75, 7, 4.25, 3.25, 1.75, 0)
OUT <- c(0.25, 2, 5.25, 8.5, 10.5, 11, 11.75, 9.25, 9.5, 7, 3.75, 4, 3.5)
dat <- data.frame(IN, OUT)
rownames(dat) <- c("10~11", "11~12", "12~13", "13~14", "14~15", "15~16", 
                   "16~17", "17~18", "18~19", "19~20", "20~21", "21~22", "22~23")

# Obtain a Cullen and Frey graph
descdist(dat$IN, discrete = FALSE)

# Fit a distribution and inspect it 
normal_distribution <- fitdist(dat$IN, "norm")
plot(normal_distribution)

Read more about the CF graph here and here.

Upvotes: 0

Related Questions