Reputation: 83
I saw that there are some topics on this issue (here and here), but in both cases the examples were with multiple comma-delimited choices. In this case it is a little different.
The list of predefined choices is registered as follows:
Q1.list <- c ("Phrase one without comma", "Phrase two also without comma", "Phrase three, with comma")
The database looks like this:
Q1
"Phrase one without comma, Phrase two also without comma"
"Phrase two also without comma, Phrase three, with comma"
"Phrase three, with comma, Phrase four, other reasons"
"Phrase one without comma, Phrase four, other reasons, Phrase five other reasons"
And I would like to transform the data set in this way:
Q1.1 Q1.2 Q1.3 Others
1 1 0 0
0 1 1 0
0 0 1 "Phrase four, other reasons"
1 0 0 "Phrase four, other reasons, Phrase five other reasons [and everything else that is not on the Q1.list]"
Could someone shed light on how to solve this problem?
Upvotes: 0
Views: 439
Reputation: 1972
You can use dplyr & co. and do as follows.
library(dplyr)
library(stringr)
data %>%
transmute(Q1.1 = +(str_detect(Q1, Q1.list[1])),
Q1.2 = +(str_detect(Q1, Q1.list[2])),
Q1.3 = +(str_detect(Q1, Q1.list[3])),
Others = str_remove_all(Q1, str_c(Q1.list, collapse = '|')),
Others = if_else(str_sub(Others, 1, 2) == ', ',
str_sub(Others, 3),
Others),
Others = if_else(Others == '', '0', Others))
# Q1.1 Q1.2 Q1.3 Others
# <int> <int> <int> <chr>
# 1 1 1 0 0
# 2 0 1 1 0
# 3 0 0 1 Phrase four, other reasons
# 4 1 0 0 Phrase four, other reasons, Phrase five other reasons
Data
data <- structure(list(Q1 = c("Phrase one without comma, Phrase two also without comma",
"Phrase two also without comma, Phrase three, with comma", "Phrase three, with comma, Phrase four, other reasons",
"Phrase one without comma, Phrase four, other reasons, Phrase five other reasons"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
Q1.list <- c("Phrase one without comma", "Phrase two also without comma", "Phrase three, with comma")
Upvotes: 1