Reputation: 359
I'm trying to build a cumulative density plot for a project using data that has different column lengths. I wanna be able to graphically show how water use differs between users from different sample sizes. Something along the lines with % of population (0% - 100%) along the x axis and water use (gallons) along the y-axis.
So something like this I guess...
WaterUse <- data.frame(A = c(S1DataCP$atotwu, S2DataCP$atotwu, S3DataCP$atotwu, S4DataCP$atotwu) )
ggplot(WaterUse, aes(x = ?, color = c("1", "2", "3", "4"))) + stat_ecdf()
Where the data frame is built from queries of four different data sets gathering the water use (atotwu) of that particular population. The columns in the data frame are of different lengths and have no given order.
Upvotes: 0
Views: 192
Reputation: 3369
Let's create sample data of varying length which mimics the structure of your data frame with a column named atotwu
set.seed(10)
S1DataCP <- cbind.data.frame(rnorm(50))
S2DataCP <- cbind.data.frame(rnorm(45))
S3DataCP <- cbind.data.frame(rnorm(55))
S4DataCP <- cbind.data.frame(rnorm(60))
colnames(S1DataCP) <- colnames(S2DataCP) <- colnames(S3DataCP) <- colnames(S4DataCP) <- "atotwu"
Here I just column bind an identifier corresponding to the site:
S1DataCP_df <- cbind(1, S1DataCP$atotwu)
S2DataCP_df <- cbind(2, S2DataCP$atotwu)
S3DataCP_df <- cbind(3, S3DataCP$atotwu)
S4DataCP_df <- cbind(4, S4DataCP$atotwu)
Since ggplot2
takes data in long format, we row bind the individual data frames:
WaterUse <- rbind.data.frame(S1DataCP_df, S2DataCP_df, S3DataCP_df, S4DataCP_df)
colnames(WaterUse) <- c("Site", "atotwu")
Finally plot the stat_ecdf
ggplot(WaterUse, aes(x=atotwu, colour=factor(Site))) + stat_ecdf() + labs(colour="Site")
Upvotes: 1