Reputation: 1479
I was attempting to replace what I thought was a standard dash using gsub
. The code I was testing was:
gsub("-", "ABC", "reported – estimate")
This does nothing, though. I copied and pasted the dash into http://unicodelookup.com/#–/1 and it seems to be a en dash. That site provides the hex, dec etc codes for an en dash and I've been trying to replace the en dash but am not having luck. Suggestions?
(As a bonus, if you can tell me if there is a function to identify special characters that would be helpful).
I'm not sure if SO's code formatting will change the dash format so here is the dash I'm using (–).
Upvotes: 6
Views: 5087
Reputation: 626927
You can replace the en-dash by just specifying it in the regex pattern.
gsub("–", "ABC", "reported – estimate")
You can match all hyphens, en- and em-dashes with
gsub("[-–—]", "ABC", "reported – estimate — more - text")
See IDEONE demo
To check if there are non-ascii characters in a string, use
> s = "plus ça change, plus c'est la même chose"
> gsub("[[:ascii:]]+", "", s, perl=T)
[1] "çê"
See this IDEONE demo
You will either get an empty result (if a string only consists of "word" characters and whitespace), or - as here - some "special" characters.
Upvotes: 6
Reputation: 1173
for special character replacement you can do a negative complement.
gsub('[^\\w]*', 'ABC', 'reported - estimate', perl = True)
will replace all special characters with ABC. The [^\w] is a pattern that says anything that isn't a normal character.
Upvotes: 3