Til Hund
Til Hund

Reputation: 1681

Multiple choice variable with ggplot2

I think I do not see something quite obvious here. I have a multiple choice question (date here) with 5 answer categories.

I want to melt all 5 variables together to have one graph with ggplot2. Here is my code:

mydata <- data.frame(data$Q006_01, data$Q006_02, data$Q006_03, data$Q006_04, data$Q006_05) # multiple choice question
md <- melt(mydata, id=c("data.Q006_01", "data.Q006_02", "data.Q006_03", "data.Q006_04", "data.Q006_05"))
luogo_lavoro <- factor(md[,1]) # error here?
ggplot(data, aes(x=luogo_lavoro)) + geom_histogram() + xlab("") + ylab("Number of participants") + ggtitle("If you had to choose now, where would you be willing to accept a job?") + theme(axis.text.y = element_text(colour = "black"), axis.text.x = element_text(colour = "black")) + scale_x_discrete(labels=str_wrap(c("in the district I live in", "in another district as long as reachable within a dayride", "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) + ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

What do I wrong here?

Upvotes: 0

Views: 1064

Answers (2)

jlhoward
jlhoward

Reputation: 59395

Like this?

library(ggplot2)
library(reshape2)
library(stringr)
data <- data.frame(id=1:nrow(data),data)
md <- melt(data,id="id")
ggplot(subset(md,value & !is.na(value)), aes(x=variable)) + 
  geom_histogram(colour="grey50",fill="lightgreen") + xlab("") + ylab("Number of participants") + 
  ggtitle("If you had to choose now, where would you be willing to accept a job?") + 
  theme(axis.text.y = element_text(colour = "black"), 
        axis.text.x = element_text(colour = "black")) + 
  scale_x_discrete(labels=str_wrap(c("in the district I live in", 
                                     "in another district as long as reachable within a dayride", 
                                     "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) +
  coord_flip()+
  ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], 
         units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

In melt(...), the id=... argument must specify a column that distinguishes between the different rows (equivalent to rownames). So I added an id column to data and melted on that. Now md has three columns: id, variable, and value. variable contains what used to be the column names, so Q006_01, etc., and value contains T or F depending on the response. value can also contain NA if there was no answer.

So in the call to ggplot(...) we use the subset of md where the response (value) was TRUE, and not NA. Doing this, geom_hist(...) counts the number of TRUEs. I included coord_flip() at the end so that the labels are more readable.

Upvotes: 4

Paul Hiemstra
Paul Hiemstra

Reputation: 60964

You probably need to pass md to ggplot in stead of rawdata. In addition, it is best to make luogo_lavoro part of md:

md$luogo_lavoro <- factor(md[,1])

Upvotes: 0

Related Questions