Excalibur
Excalibur

Reputation: 441

How to draw multiple CDF plots of vectors with different number of rows

I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code:

library("ggplot2")

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

As we can see, the a3 is 800 length, which is different with a1, a2. When I run the code, it shows:

> df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
Error in data.frame(x = c(a1, a2, a3), ggg = gl(3, 1000)) : 
arguments imply differing number of rows: 2800, 3000
> ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) +    scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Error: ggplot2 doesn't know how to deal with data of class function

So, how can I draw the cdf plots of different variables that is not the same length in the same graph using ggplot2? Looking forward for helps!

Upvotes: 6

Views: 8529

Answers (2)

jlhoward
jlhoward

Reputation: 59425

ggplot has no trouble at all dealing with different counts in each group. The problem is with your creation of the factor ggg. Use this:

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3), ggg=factor(rep(1:3, c(1000,1000,800))))
ggplot(df, aes(x, colour = ggg)) + 
  stat_ecdf()+
  scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Also, the way you have it set up, setting xlim=c(0,3), draws the cdf on [0,3], which as you can see in the plot above is more or less a straight line.

Upvotes: 6

MrFlick
MrFlick

Reputation: 206566

You're right in that ggplot sure does seem to want equal numbers of counts in each group. So rather than useing stat_ecdf, perhaps you could just do the calculation yourself

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = factor(rep(1:3, c(1000,1000,800))))

df <- df[order(df$x), ]
df$ecdf <- ave(df$x, df$ggg, FUN=function(x) seq_along(x)/length(x))

ggplot(df, aes(x, ecdf, colour = ggg)) + geom_line() + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Note that you were using gl() incorrectly; your code assumed all three groups had 1000 entries as well. Here i've changed it to rep() to get the right number of labels per group.

ecdf pggplot

Upvotes: 4

Related Questions