Reputation: 441
I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code:
library("ggplot2")
a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)
df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
As we can see, the a3 is 800 length, which is different with a1, a2. When I run the code, it shows:
> df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
Error in data.frame(x = c(a1, a2, a3), ggg = gl(3, 1000)) :
arguments imply differing number of rows: 2800, 3000
> ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Error: ggplot2 doesn't know how to deal with data of class function
So, how can I draw the cdf plots of different variables that is not the same length in the same graph using ggplot2? Looking forward for helps!
Upvotes: 6
Views: 8529
Reputation: 59425
ggplot
has no trouble at all dealing with different counts in each group. The problem is with your creation of the factor ggg. Use this:
library(ggplot2)
a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)
df <- data.frame(x = c(a1, a2, a3), ggg=factor(rep(1:3, c(1000,1000,800))))
ggplot(df, aes(x, colour = ggg)) +
stat_ecdf()+
scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Also, the way you have it set up, setting xlim=c(0,3)
, draws the cdf on [0,3]
, which as you can see in the plot above is more or less a straight line.
Upvotes: 6
Reputation: 206566
You're right in that ggplot sure does seem to want equal numbers of counts in each group. So rather than useing stat_ecdf
, perhaps you could just do the calculation yourself
library(ggplot2)
a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)
df <- data.frame(x = c(a1, a2, a3),ggg = factor(rep(1:3, c(1000,1000,800))))
df <- df[order(df$x), ]
df$ecdf <- ave(df$x, df$ggg, FUN=function(x) seq_along(x)/length(x))
ggplot(df, aes(x, ecdf, colour = ggg)) + geom_line() + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Note that you were using gl()
incorrectly; your code assumed all three groups had 1000 entries as well. Here i've changed it to rep()
to get the right number of labels per group.
Upvotes: 4