Reputation: 28139
I have a string and I want to remove all non-alphanumeric symbols from and then put into a vector.
So this:
"This is a string. In addition, this is a string!"
would become:
>stringVector1
"This","is","a","string","In","addition","this","is","a","string"
I've looked at grep()
but can't find an example that matches. Any suggestions?
Upvotes: 29
Views: 54607
Reputation: 1354
Another approach to handle this question
library(stringr)
text = c("This is a string. In addition, this is a string!")
str_split(str_squish((str_replace_all(text, regex("\\W+"), " "))), " ")
#[1] "This" "is" "a" "string" "In" "addition" "this" "is" "a" "string"
str_replace_all(text, regex("\\W+"), " ")
: find non-word character and replace " "
str_squish()
: reduces repeated whitespace inside a stringstr_split()
: split up a string into piecesUpvotes: 6
Reputation: 66842
here is an example:
> str <- "This is a string. In addition, this is a string!"
> str
[1] "This is a string. In addition, this is a string!"
> strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]
[1] "This" "is" "a" "string" "In" "addition" "this" "is" "a"
[10] "string"
Upvotes: 54