screechOwl
screechOwl

Reputation: 28139

Remove non-alphanumeric symbols from a string

I have a string and I want to remove all non-alphanumeric symbols from and then put into a vector.

So this:

"This is a string.  In addition, this is a string!" 

would become:

>stringVector1

"This","is","a","string","In","addition","this","is","a","string"

I've looked at grep() but can't find an example that matches. Any suggestions?

Upvotes: 29

Views: 54607

Answers (2)

Mike V
Mike V

Reputation: 1354

Another approach to handle this question

library(stringr)
text =  c("This is a string.  In addition, this is a string!")
str_split(str_squish((str_replace_all(text, regex("\\W+"), " "))), " ")
#[1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"        "string"  
  • str_replace_all(text, regex("\\W+"), " "): find non-word character and replace " "
  • str_squish(): reduces repeated whitespace inside a string
  • str_split(): split up a string into pieces

Upvotes: 6

kohske
kohske

Reputation: 66842

here is an example:

> str <- "This is a string. In addition, this is a string!"
> str
[1] "This is a string. In addition, this is a string!"
> strsplit(gsub("[^[:alnum:] ]", "", str), " +")[[1]]
 [1] "This"     "is"       "a"        "string"   "In"       "addition" "this"     "is"       "a"       
[10] "string"  

Upvotes: 54

Related Questions