Reputation: 101
Consider that I have the below mentioned String;
str_input <- c("Mellanox,Asia, China, India, JAVA, United States, APIs")
I have used the below mentioned gsub code which removes my specific StopWords.
gsub(paste0("\\b(",paste(location_sw, collapse="|"),")\\b"), "", str_input)
where, location_sw consists of my list of stopwords as mentioned below
location_sw <- c('Rose', 'Java', 'JAVA', 'Mellanox', 'Microsoft', '144GiB', 'West',
'Amazon', 'Channel Asia', 'jClarity', 'APIs')
On using the above provided gsub code, I am getting the below mentioned output
",Asia, China, India, , United States, "
However, I would like the following outcome;
"Asia, China, India, United States"
I would like to remove the commas present after removing the stopwords. Any inputs will be really helpfull.
Upvotes: 4
Views: 328
Reputation: 13319
A base
option:
paste(lapply(strsplit(str_input,",|,\\s"), function(x)
x[!x %in% location_sw])[[1]],collapse=", ")
[1] "Asia, China, India, United States"
Upvotes: 1
Reputation: 6234
Another approach is to strsplit
the string into a character vector and then taking the setdiff
with respect to location_sw
:
out <- setdiff(strsplit(str_input, split = ",\\s*")[[1]], location_sw)
out
#> [1] "Asia" "China" "India" "United States"
If necessary, we can paste
it back to a character:
paste(out, collapse = ", ")
#> [1] "Asia, China, India, United States"
Upvotes: 4
Reputation: 627129
You may use
str_input <- c("Mellanox,Asia, China, India, JAVA, United States, APIs")
rx <- paste0("(?:,\\s*)*\\b(?:",paste(location_sw, collapse="|"),")\\b")
trimws(gsub(rx, "", str_input), whitespace = "[\\s,]")
## => [1] "Asia, China, India, United States"
The (?:,\\s*)
will match 0 or more occurrences of a comma followed with 0 or more whitespaces.
The trimws with whitespace = "[\\s,]"
will remove leading and trailing whitespace and commas.
Upvotes: 3