Reputation: 97
I am dealing with the situation where data from a survey question have multiple answers. So a respondent who has answered the question was able to tick more than one box. The result is that data set includes the multiple answers together in as one value.
df <- c("VrolijkGemotiveerd", "RelaxtGemotiveerdVrolijk", "Neutraal", "TrotsGezegend", "Neutraal", "Neutraal", "VermoeidGemotiveerd")
I want to split for example RelaxtGemotiveerdVrolijk
into Column 1
: Relaxt
en Column 2
: Gemotiveerd
and Column 3
: Vrolijk
.
Upvotes: 0
Views: 481
Reputation: 2748
It looks like you want to split each string wherever an upper-case letter occurs, which can be done using a regular expression. There are lots of functions that you can use to apply regexes in this way, e.g. strsplit()
, stringr::str_split()
etc, but tidyr
has a function specifically for adding new columns using this method:
df <- data.frame(
c1 = c("VrolijkGemotiveerd", "RelaxtGemotiveerdVrolijk", "Neutraal",
"TrotsGezegend", "Neutraal", "Neutraal", "VermoeidGemotiveerd")
)
tidyr::separate(df, c1, into = c("c2", "c3", "c4"),
sep = "(?<=.)(?=[[:upper:]])", fill = "right", remove = FALSE)
#> c1 c2 c3 c4
#> 1 VrolijkGemotiveerd Vrolijk Gemotiveerd <NA>
#> 2 RelaxtGemotiveerdVrolijk Relaxt Gemotiveerd Vrolijk
#> 3 Neutraal Neutraal <NA> <NA>
#> 4 TrotsGezegend Trots Gezegend <NA>
#> 5 Neutraal Neutraal <NA> <NA>
#> 6 Neutraal Neutraal <NA> <NA>
#> 7 VermoeidGemotiveerd Vermoeid Gemotiveerd <NA>
EDIT: Updated to use the regular expression from @Laterow's answer, as mine was a bit broken.
Upvotes: 1
Reputation: 3235
Answer
Assuming that categories always start with capital letters, use strsplit
with perl
-compatible regular expressions:
strsplit(df, "(?<=.)(?=[[:upper:]])", perl = TRUE)
Output:
[[1]]
[1] "Vrolijk" "Gemotiveerd"
[[2]]
[1] "Relaxt" "Gemotiveerd" "Vrolijk"
[[3]]
[1] "Neutraal"
[[4]]
[1] "Trots" "Gezegend"
[[5]]
[1] "Neutraal"
[[6]]
[1] "Neutraal"
[[7]]
[1] "Vermoeid" "Gemotiveerd"
Rationale
strsplit
let's you split strings by a pattern. Regular expressions allow you to operate on patterns in strings. The pattern is to find the capital letter (i.e. [[:upper:]]
). The other parts are necessary to properly split at each capital letter, to maintain the letter you split on, and to split before the capital letter rather than after.
This code returns a list that you can then use for further processing.
Upvotes: 0