Reputation: 1681
I think I do not see something quite obvious here. I have a multiple choice question (date here) with 5 answer categories.
I want to melt all 5 variables together to have one graph with ggplot2. Here is my code:
mydata <- data.frame(data$Q006_01, data$Q006_02, data$Q006_03, data$Q006_04, data$Q006_05) # multiple choice question
md <- melt(mydata, id=c("data.Q006_01", "data.Q006_02", "data.Q006_03", "data.Q006_04", "data.Q006_05"))
luogo_lavoro <- factor(md[,1]) # error here?
ggplot(data, aes(x=luogo_lavoro)) + geom_histogram() + xlab("") + ylab("Number of participants") + ggtitle("If you had to choose now, where would you be willing to accept a job?") + theme(axis.text.y = element_text(colour = "black"), axis.text.x = element_text(colour = "black")) + scale_x_discrete(labels=str_wrap(c("in the district I live in", "in another district as long as reachable within a dayride", "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) + ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)
What do I wrong here?
Upvotes: 0
Views: 1064
Reputation: 59395
Like this?
library(ggplot2)
library(reshape2)
library(stringr)
data <- data.frame(id=1:nrow(data),data)
md <- melt(data,id="id")
ggplot(subset(md,value & !is.na(value)), aes(x=variable)) +
geom_histogram(colour="grey50",fill="lightgreen") + xlab("") + ylab("Number of participants") +
ggtitle("If you had to choose now, where would you be willing to accept a job?") +
theme(axis.text.y = element_text(colour = "black"),
axis.text.x = element_text(colour = "black")) +
scale_x_discrete(labels=str_wrap(c("in the district I live in",
"in another district as long as reachable within a dayride",
"in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) +
coord_flip()+
ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2],
units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)
In melt(...)
, the id=...
argument must specify a column that distinguishes between the different rows (equivalent to rownames). So I added an id column to data and melted on that. Now md
has three columns: id
, variable
, and value
. variable
contains what used to be the column names, so Q006_01
, etc., and value
contains T
or F
depending on the response. value
can also contain NA
if there was no answer.
So in the call to ggplot(...)
we use the subset of md where the response (value
) was TRUE, and not NA
. Doing this, geom_hist(...)
counts the number of TRUEs
. I included coord_flip()
at the end so that the labels are more readable.
Upvotes: 4
Reputation: 60964
You probably need to pass md
to ggplot
in stead of rawdata
. In addition, it is best to make luogo_lavoro
part of md
:
md$luogo_lavoro <- factor(md[,1])
Upvotes: 0