Reputation: 78
I've read the threads and package updates for encoding issues with Shiny, but I have a (difficult-to-reproduce example) database-driven Shiny app which is fumbling some special characters.
In my postgresql database I see correctly my Swedish river, "Upper Umeälven River", which - when I filter it back to the Shiny interface with dplyr:
names.rivers <- filter(tbl.rivers, Country == "Sweden")
...becomes "Upper Umeälven River" in R.
I'm using UTF-8 encoding locally; I guess I'm losing something on the exchange with the database.
Sys.getlocale()
[1] "LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"
Apologies again for the lack of example, it's ONLY an issue pulling from the database. I suspect I'm missing a flag on some sanitizing function someplace, but need some help getting pointed the right direction.
Upvotes: 2
Views: 1139
Reputation: 78
As suspected, the answer was simple:
iconv(vector.to.convert, "UTF-8")
My "learnings":
My understanding is a bit shallow, but - frankly - I'm not digging deeper into the world of character encoding for the moment. I hope it helps someone else avoid the error!
Upvotes: 1
Reputation: 3654
In your code page 1252 Windows Latin 1 the rendering for the 'ä' in Upper Umeälven River
is to the code point 0xE4 (binary 11100100).
The Upper Umeälven River
in the same code page has the two octets 0xC3A4 (XXX00011 XX100100).
However, if you consider the UTF-8 encoding rules of the code point, the significant bits are exactly the same.
Somewhere there is an inadvertent, or erroneous, character encoding taking place that transposes the character into UTF-8, but still considers the string to have the Windows Latin 1 code page.
Perhaps the data is already being received in UTF-8 and you can change the code page to receiving code page to reflect that. There may be a silent transformation happening somewhere further back, and no indication of this.
Upvotes: 1