Reputation: 2297
I have a data that looks like this:
And i would like to build a new variable to only show music ones. I tried to use gsub to build it but it did not work. Any suggestion on how to do this. Not limit to gsub.
My codes are: df$music<-gsub("Sawing"|"Cooking", "", df$Hobby)
The outcome should be sth that looks like this:
Sample data can be build using codes:
df<- structure(list(Hobby = c("cooking, sawing, piano, violin", "cooking, violin",
"piano, sawing", "sawing, cooking")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Views: 38
Reputation: 389012
Another way to do this would be :
library(dplyr)
library(tidyr)
df %>%
mutate(index = row_number()) %>%
separate_rows(Hobby, sep = ',\\s*') %>%
group_by(index) %>%
summarise(Music = toString(setdiff(Hobby, c('sawing', 'cooking'))),
Hobby = toString(Hobby)) %>%
select(Hobby,Music)
# Hobby Music
# <chr> <chr>
#1 cooking, sawing, piano, violin "piano, violin"
#2 cooking, violin "violin"
#3 piano, sawing "piano"
#4 sawing, cooking ""
Upvotes: 1
Reputation: 887213
The double quotes opening and closing should be a single pair "Sawing|Cooking"
and not "Sawing"|"Cooking"
in the pattern
df$music<- trimws(gsub("Sawing|Cooking", "", df$Hobby, ignore.case = TRUE),
whitespace ="(,\\s*){1,}")
trimws
will remove the leading/lagging ,
with spaces (if any)
The opposite would be to extract the words of interest and paste
them
library(stringr)
sapply(str_extract_all(df$Hobby, 'piano|violin'), toString)
Upvotes: 3