Scott
Scott

Reputation: 79

how to plot number of valid rows with ggplot2

With a dataframe as

df <- data.frame(name = c("a", "b", "c", "d", "e"),
             class = c("a1", "a1", "a1", "b1", "b1"),
             var1 = c("S", "S", "R", "S", "S"),
             var2 = c("S", "R", NA, NA, "R"),
             var3 = c(NA, "R", "R", "S", "S"))

I would like to plot the number of rows without NAs for var1 from var3.

One way I found is to generate another dataframe as

df_count <- matrix(nrow=3, ncol=2)
df_count <- as.data.frame(df_count)
names(df_count) <- c("var_num", "count")
df_count$var_num <- as.factor(names(df)[3:5])
for (i in 1:3) {
    df_count[i,2] <- sum(!is.na(df[,i+2]))
}

and then plot as

ggplot(df_count, aes(x=var_num, y=count)) + geom_bar(stat="identity")

Is there an easier way to choose var1 through var3 and count the valid rows without generating a new dataframe?

Upvotes: 3

Views: 2645

Answers (1)

Sathish
Sathish

Reputation: 12713

library('ggplot2')
library('reshape2')

df <- melt(df, id.vars = c('name', 'class'))  # melt data
df <- df[!is.na(df$value), ]                  # remove NA
df <- with(df, aggregate(df, by = list(variable), FUN = length )) # compute length by grouping variable

ggplot(df, aes( x = Group.1, y = value, fill = Group.1 )) + 
   geom_bar(stat="identity")

enter image description here

stacked bar

df <- melt(df, id.vars = c('name', 'class'))  # melt data
df <- df[!is.na(df$value), ]                  # remove NA
df <- with(df, aggregate(df, by = list(variable, value), FUN = length )) # compute length by grouping variable and value

ggplot(df, aes( x = Group.1, y = value, fill = Group.2 )) + 
  geom_bar(stat="identity")

enter image description here

Data:

df <- data.frame(name = c("a", "b", "c", "d", "e"),
                 class = c("a1", "a1", "a1", "b1", "b1"),
                 var1 = c("S", "S", "R", "S", "S"),
                 var2 = c("S", "R", NA, NA, "R"),
                 var3 = c(NA, "R", "R", "S", "S"))

Upvotes: 3

Related Questions