Reputation: 45
I'm working with a dataframe, from a survey I took (n=108). One of the questions (columns) contain four possible answers- another, 10. Here is my issue: when plotting either of these columns, it plots each level. The four-answer column is considered a factor with 13 levels and the 10-answer, a factor with 22 levels. Each time someone chose more than one answer, it counts as a separate level (i.e. "A,B"
, "A,B,C"
, etc.) My question is, how do I go about representing how many respondents chose "A"
or "B"
or a combination of "A"
and "B"
but not necessarily/only "A,B"
, regardless of what other choices the made, if any.
I wish to plot these items correctly, as well as analyze the data by, say, how many Female
respondents chose "A"
versus how many Male
respondents, and so on.
My issues:
1) plot(data$letter)
plots 13 different bars whereas letter
is a question with only 4 possible answers, select all that apply.
2) I can't show through analysis how many chose "A"
if they also happened to choose another answer, because "A","C"
isn't equivalent to "A"
.
Solutions I'm searching for:
1) When plot(data$letter)
, I want only four bars showing how many times each letter was chosen.
2) I need to work with all values of "A"
in analysis, even if the respondent selected more than just "A"
Thank you!
Also, How to clean and re-code check-all-that-apply responses in R survey data? is a question I found before posting that explains it in totality, but the code is fairly advanced at my level of experience with R.
Upvotes: 1
Views: 3861
Reputation: 15897
I can give you two ideas that might help for the two issues you mention. First, I create some sample data:
set.seed(175)
choices <- c("A", "B", "C", "A,B", "A,C", "B,C", "A,B,C")
data <- data.frame(respondent = 1:15,
letter = sample(choices, 15, replace = TRUE))
data
## respondent letter
## 1 1 A,C
## 2 2 B,C
## 3 3 A,B
## 4 4 C
## 5 5 B,C
## 6 6 A,B
## 7 7 B
## 8 8 A
## 9 9 B,C
## 10 10 C
## 11 11 A
## 12 12 B
## 13 13 C
## 14 14 A,C
## 15 15 A,B,C
For simplicity, I used only three levels.
1) The following function can be used to plot `data$letter) directly in the way you want:
plot_allapply <- function(choices) {
# convert to character
choices <- as.character(choices)
# split at comma and unlist
choices_split <- unlist(strsplit(choices, ","))
# convert back to factor and plot
plot(as.factor(choices_split))
}
plot_allapply(data$letter)
It works as follows: First, the data in letter
needs to be converted from type factor
to character
. (I know that it is a factor, because otherwise you would not get a plot at all.) Then, each element of the character vector is split at the commas. (Run strsplit(as.character(data$letter), ",")
to see how this works for your data and ?strsplit
for more information.). Since this yields a list, it is converted to a character vector using unlist
. The last line converts back to factor
(which is needed in order for plot
to create the right kind of plot) and plotted.
2) There are many ways how you could work with the data in data$letter
. If you are interested to know, which respondents chose "B"
, you could do
grepl("B", data$letter)
This will return a logical vector that is TRUE
whenever "B"
is contained in a respondents answer. Thus, all of those will give TRUE
: "B", "A,B", "A,B,C"
.
Maybe it helps to add this information to your data frame. This could be done as follows:
data <- transform(data, isA = grepl("A", letter),
isB = grepl("B", letter),
isC = grepl("C", letter))
data
## respondent letter isA isB isC
## 1 1 A,C TRUE FALSE TRUE
## 2 2 B,C FALSE TRUE TRUE
## 3 3 A,B TRUE TRUE FALSE
## 4 4 C FALSE FALSE TRUE
## 5 5 B,C FALSE TRUE TRUE
## 6 6 A,B TRUE TRUE FALSE
## 7 7 B FALSE TRUE FALSE
## 8 8 A TRUE FALSE FALSE
## 9 9 B,C FALSE TRUE TRUE
## 10 10 C FALSE FALSE TRUE
## 11 11 A TRUE FALSE FALSE
## 12 12 B FALSE TRUE FALSE
## 13 13 C FALSE FALSE TRUE
## 14 14 A,C TRUE FALSE TRUE
## 15 15 A,B,C TRUE TRUE TRUE
Upvotes: 2