How to build a cumulative density plot using data with different column lengths, ggplot2?

Question

I'm trying to build a cumulative density plot for a project using data that has different column lengths. I wanna be able to graphically show how water use differs between users from different sample sizes. Something along the lines with % of population (0% - 100%) along the x axis and water use (gallons) along the y-axis.

So something like this I guess...

WaterUse <- data.frame(A = c(S1DataCP$atotwu, S2DataCP$atotwu, S3DataCP$atotwu, S4DataCP$atotwu) )

ggplot(WaterUse, aes(x = ?, color = c("1", "2", "3", "4"))) + stat_ecdf()

Where the data frame is built from queries of four different data sets gathering the water use (atotwu) of that particular population. The columns in the data frame are of different lengths and have no given order.

Djork · Accepted Answer

Let's create sample data of varying length which mimics the structure of your data frame with a column named atotwu

set.seed(10)
S1DataCP <- cbind.data.frame(rnorm(50))
S2DataCP <- cbind.data.frame(rnorm(45))
S3DataCP <- cbind.data.frame(rnorm(55))
S4DataCP <- cbind.data.frame(rnorm(60))
colnames(S1DataCP) <- colnames(S2DataCP) <- colnames(S3DataCP) <- colnames(S4DataCP) <- "atotwu"

Here I just column bind an identifier corresponding to the site:

S1DataCP_df <- cbind(1, S1DataCP$atotwu)
S2DataCP_df <- cbind(2, S2DataCP$atotwu)
S3DataCP_df <- cbind(3, S3DataCP$atotwu)
S4DataCP_df <- cbind(4, S4DataCP$atotwu)

Since ggplot2 takes data in long format, we row bind the individual data frames:

WaterUse <- rbind.data.frame(S1DataCP_df, S2DataCP_df, S3DataCP_df, S4DataCP_df)
colnames(WaterUse) <- c("Site", "atotwu")

Finally plot the stat_ecdf

ggplot(WaterUse, aes(x=atotwu, colour=factor(Site))) + stat_ecdf() + labs(colour="Site")

How to build a cumulative density plot using data with different column lengths, ggplot2?

Answers (1)

Related Questions