Reputation: 31
I have the following data frame.
IN <- c(3.5, 5.75, 9, 13.25, 13, 9.5, 9.25, 6.75, 7, 4.25, 3.25, 1.75, 0)
OUT <- c(0.25, 2, 5.25, 8.5, 10.5, 11, 11.75, 9.25, 9.5, 7, 3.75, 4, 3.5)
dat <- data.frame(IN, OUT)
rownames(dat) <- c("10~11", "11~12", "12~13", "13~14", "14~15", "15~16", "16~17", "17~18", "18~19", "19~20", "20~21", "21~22", "22~23")
This data is the average number of people measured in restaurants four days per hour from 10:00 am to 11:00 pm.
I want to know the distribution of IN and OUT data, respectively. How do I know this in R? Otherwise, Is there a good way to analyze this through R?
Upvotes: 0
Views: 3676
Reputation: 4314
The fitdistrplus
package can help with this kind of thing, but you need to know what candidate distributions you want to check. Let's try normal, uniform, and exponential:
library(fitdistrplus)
fit.in1 <- fitdist(dat$IN, "norm")
fit.in2 <- fitdist(dat$IN, "unif")
fit.in3 <- fitdist(dat$IN, "exp")
Then you can plot some diagnostics:
par(mfrow=c(2,2)
denscomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
qqcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
cdfcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
ppcomp(list(fit.in1,fit.in2,fit.in3),legendtext=c("Normal","Uniform","Exponential"))
Is it normal? Maybe:
> shapiro.test(dat$IN)
Shapiro-Wilk normality test
data: dat$IN
W = 0.96548, p-value = 0.8352
Is it uniform over [0,14]? Maybe
> ks.test(dat$IN,"punif",0,14)
One-sample Kolmogorov-Smirnov test
data: dat$IN
D = 0.16758, p-value = 0.8024
alternative hypothesis: two-sided
The null hypotheses for these tests are that the distribution is what you think it is. The alternative is that the distribution is NOT what you are testing against. So the tinier p-values mean that a particular distribution is not a good candidate for fit.
Upvotes: 2
Reputation: 2139
You can use the fitdistrplus package as follows:
library(fitdistrplus)
IN <- c(3.5, 5.75, 9, 13.25, 13, 9.5, 9.25, 6.75, 7, 4.25, 3.25, 1.75, 0)
OUT <- c(0.25, 2, 5.25, 8.5, 10.5, 11, 11.75, 9.25, 9.5, 7, 3.75, 4, 3.5)
dat <- data.frame(IN, OUT)
rownames(dat) <- c("10~11", "11~12", "12~13", "13~14", "14~15", "15~16",
"16~17", "17~18", "18~19", "19~20", "20~21", "21~22", "22~23")
# Obtain a Cullen and Frey graph
descdist(dat$IN, discrete = FALSE)
# Fit a distribution and inspect it
normal_distribution <- fitdist(dat$IN, "norm")
plot(normal_distribution)
Read more about the CF graph here and here.
Upvotes: 0