Reputation: 174776
Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.
This is my code.
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# ""
I think this is all about changing the locale settings, I don't know how to do that.
I'm running rstudio on Windows 8.
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.2.0
@Anada's answer is good but we need to add that encoding
parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8
on Windows?
Upvotes: 7
Views: 347
Reputation: 193637
Seems to be a problem with encoding.
Consider:
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# [1] ""
gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"
Upvotes: 6