Multiple choice variable with ggplot2

Question

I think I do not see something quite obvious here. I have a multiple choice question (date here) with 5 answer categories.

I want to melt all 5 variables together to have one graph with ggplot2. Here is my code:

mydata <- data.frame(data$Q006_01, data$Q006_02, data$Q006_03, data$Q006_04, data$Q006_05) # multiple choice question
md <- melt(mydata, id=c("data.Q006_01", "data.Q006_02", "data.Q006_03", "data.Q006_04", "data.Q006_05"))
luogo_lavoro <- factor(md[,1]) # error here?
ggplot(data, aes(x=luogo_lavoro)) + geom_histogram() + xlab("") + ylab("Number of participants") + ggtitle("If you had to choose now, where would you be willing to accept a job?") + theme(axis.text.y = element_text(colour = "black"), axis.text.x = element_text(colour = "black")) + scale_x_discrete(labels=str_wrap(c("in the district I live in", "in another district as long as reachable within a dayride", "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) + ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

What do I wrong here?

jlhoward · Accepted Answer

Like this?

library(ggplot2)
library(reshape2)
library(stringr)
data <- data.frame(id=1:nrow(data),data)
md <- melt(data,id="id")
ggplot(subset(md,value & !is.na(value)), aes(x=variable)) + 
  geom_histogram(colour="grey50",fill="lightgreen") + xlab("") + ylab("Number of participants") + 
  ggtitle("If you had to choose now, where would you be willing to accept a job?") + 
  theme(axis.text.y = element_text(colour = "black"), 
        axis.text.x = element_text(colour = "black")) + 
  scale_x_discrete(labels=str_wrap(c("in the district I live in", 
                                     "in another district as long as reachable within a dayride", 
                                     "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) +
  coord_flip()+
  ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], 
         units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

In melt(...), the id=... argument must specify a column that distinguishes between the different rows (equivalent to rownames). So I added an id column to data and melted on that. Now md has three columns: id, variable, and value. variable contains what used to be the column names, so Q006_01, etc., and value contains T or F depending on the response. value can also contain NA if there was no answer.

So in the call to ggplot(...) we use the subset of md where the response (value) was TRUE, and not NA. Doing this, geom_hist(...) counts the number of TRUEs. I included coord_flip() at the end so that the labels are more readable.

Multiple choice variable with ggplot2

Answers (2)

Related Questions