jnowat
jnowat

Reputation: 45

Recognizing 'select all that apply' answers as being separate in R

I'm working with a dataframe, from a survey I took (n=108). One of the questions (columns) contain four possible answers- another, 10. Here is my issue: when plotting either of these columns, it plots each level. The four-answer column is considered a factor with 13 levels and the 10-answer, a factor with 22 levels. Each time someone chose more than one answer, it counts as a separate level (i.e. "A,B", "A,B,C", etc.) My question is, how do I go about representing how many respondents chose "A" or "B" or a combination of "A" and "B" but not necessarily/only "A,B", regardless of what other choices the made, if any.

I wish to plot these items correctly, as well as analyze the data by, say, how many Female respondents chose "A" versus how many Male respondents, and so on.

My issues:

1) plot(data$letter) plots 13 different bars whereas letter is a question with only 4 possible answers, select all that apply.

2) I can't show through analysis how many chose "A" if they also happened to choose another answer, because "A","C" isn't equivalent to "A".

Solutions I'm searching for:

1) When plot(data$letter), I want only four bars showing how many times each letter was chosen.

2) I need to work with all values of "A" in analysis, even if the respondent selected more than just "A"

Thank you!

Also, How to clean and re-code check-all-that-apply responses in R survey data? is a question I found before posting that explains it in totality, but the code is fairly advanced at my level of experience with R.

Upvotes: 1

Views: 3861

Answers (1)

Stibu
Stibu

Reputation: 15897

I can give you two ideas that might help for the two issues you mention. First, I create some sample data:

set.seed(175)
choices <- c("A", "B", "C", "A,B", "A,C", "B,C", "A,B,C")
data <- data.frame(respondent = 1:15,
                   letter = sample(choices, 15, replace = TRUE))
data
##    respondent letter
## 1           1    A,C
## 2           2    B,C
## 3           3    A,B
## 4           4      C
## 5           5    B,C
## 6           6    A,B
## 7           7      B
## 8           8      A
## 9           9    B,C
## 10         10      C
## 11         11      A
## 12         12      B
## 13         13      C
## 14         14    A,C
## 15         15  A,B,C

For simplicity, I used only three levels.

1) The following function can be used to plot `data$letter) directly in the way you want:

plot_allapply <- function(choices) {

   # convert to character
   choices <- as.character(choices)

   # split at comma and unlist
   choices_split <- unlist(strsplit(choices, ","))

   # convert back to factor and plot
   plot(as.factor(choices_split))
}
plot_allapply(data$letter)

enter image description here

It works as follows: First, the data in letter needs to be converted from type factor to character. (I know that it is a factor, because otherwise you would not get a plot at all.) Then, each element of the character vector is split at the commas. (Run strsplit(as.character(data$letter), ",") to see how this works for your data and ?strsplit for more information.). Since this yields a list, it is converted to a character vector using unlist. The last line converts back to factor (which is needed in order for plot to create the right kind of plot) and plotted.

2) There are many ways how you could work with the data in data$letter. If you are interested to know, which respondents chose "B", you could do

grepl("B", data$letter)

This will return a logical vector that is TRUE whenever "B" is contained in a respondents answer. Thus, all of those will give TRUE: "B", "A,B", "A,B,C".

Maybe it helps to add this information to your data frame. This could be done as follows:

data <- transform(data, isA = grepl("A", letter),
                  isB = grepl("B", letter),
                  isC = grepl("C", letter))
data
##    respondent letter   isA   isB   isC
## 1           1    A,C  TRUE FALSE  TRUE
## 2           2    B,C FALSE  TRUE  TRUE
## 3           3    A,B  TRUE  TRUE FALSE
## 4           4      C FALSE FALSE  TRUE
## 5           5    B,C FALSE  TRUE  TRUE
## 6           6    A,B  TRUE  TRUE FALSE
## 7           7      B FALSE  TRUE FALSE
## 8           8      A  TRUE FALSE FALSE
## 9           9    B,C FALSE  TRUE  TRUE
## 10         10      C FALSE FALSE  TRUE
## 11         11      A  TRUE FALSE FALSE
## 12         12      B FALSE  TRUE FALSE
## 13         13      C FALSE FALSE  TRUE
## 14         14    A,C  TRUE FALSE  TRUE
## 15         15  A,B,C  TRUE  TRUE  TRUE

Upvotes: 2

Related Questions