Kerri Colman
Kerri Colman

Reputation: 55

kolmogorov-smirnov plot in R ggplot

I am trying to do a KS plot in r and all seems to be going well - except for the fact that i can only use colour to visualise the two different samples and not line type.

I have tried the following:

sample1<-SD13009
sample2<-SD13009PB

group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2)))
dat <- data.frame(KSD = c(sample1,sample2), group = group)
cdf1 <- ecdf(sample1) 
cdf2 <- ecdf(sample2) 

minMax <- seq(min(sample1, sample2), max(sample1, sample2), length.out=length(sample1)) 
x0 <- minMax[which( abs(cdf1(minMax) - cdf2(minMax)) == max(abs(cdf1(minMax) - cdf2(minMax))) )] 
y0 <- cdf1(x0) 
y1 <- cdf2(x0) 

#attempt 1

plot<-ggplot(dat, aes(x = KSD, group = group, colour = group, linetype=group))+
  stat_ecdf(size=1) +
  mytheme + xlab("mm") +scale_x_continuous(limits=c(0,1))+
    ylab("Cumulitive Distibution") +
    #geom_line(aes(group=group,size=1)) +
    geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),
        linetype = "dashed", color = "red") +
    geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +
    geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +
    ggtitle("K-S Test: Sample 1 / Sample 2")

#attempt 2

 cdf <- ggplot(dat, aes(x=KSD, group=group,linetype=group)) + stat_ecdf(aes(linetype=group)) + coord_cartesian(xlim = c(0, 0.8)) + geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),
            linetype = "dashed", color = "red") +
        geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +
        geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +
        ggtitle("K-S Test: Sample 1 / Sample 2")

This is what i get:

enter image description here

Upvotes: 2

Views: 5146

Answers (1)

Axeman
Axeman

Reputation: 35382

I cannot reproduce this, with the following code:

# Make two random samples
sample1<-rnorm(1000)
sample2<-rnorm(1000, 2, 2)

group <- c(rep("sample1", length(sample1)), rep("sample2", length(sample2)))
dat <- data.frame(KSD = c(sample1,sample2), group = group)
cdf1 <- ecdf(sample1) 
cdf2 <- ecdf(sample2) 

minMax <- seq(min(sample1, sample2), max(sample1, sample2), length.out=length(sample1)) 
x0 <- minMax[which( abs(cdf1(minMax) - cdf2(minMax)) == max(abs(cdf1(minMax) - cdf2(minMax))) )] 
y0 <- cdf1(x0) 
y1 <- cdf2(x0) 


ggplot(dat, aes(x = KSD, group = group, colour = group, linetype=group))+
  stat_ecdf(size=1) +
  xlab("mm") +
  ylab("Cumulitive Distibution") +
  geom_segment(aes(x = x0[1], y = y0[1], xend = x0[1], yend = y1[1]),
               linetype = "dashed", color = "red") +
  geom_point(aes(x = x0[1] , y= y0[1]), color="red", size=1) +
  geom_point(aes(x = x0[1] , y= y1[1]), color="red", size=1) +
  ggtitle("K-S Test: Sample 1 / Sample 2")

enter image description here

It seems that in your plot the lines are so close together that you can't see that they are different linetypes, but they are.

Upvotes: 8

Related Questions