Reputation: 63
I have a data set that contains strings and special characters like the one below can be found in the data set.
How do I remove special characters like the above from my data set?
Upvotes: 0
Views: 412
Reputation: 2535
Use regular expressions to remove unwanted characters, for example:
dataset$textcolumn <- gsub("[^\\w\\s]", "", dataset$textcolumn, perl=TRUE)
to remove everything except word characters and spaces. To do more complex replacements look into the help topic ?regexp
.
Also look into the encoding (Encoding
and iconv
are helpful here.), maybe the text is correct but the wrong encoding is assumed.
Upvotes: 3