Reputation: 592
The extended ASCII character for '\xfe' is 'þ'.
However, when I try printing the character, I get:
> print('\xfe')
[1] "\376"
Is there a way to print 'þ'?
EDIT to add context as requested:
> getOption("encoding")
[1] "native.enc"
> l10n_info()
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] FALSE
> Sys.getlocale()
[1] "C"
Upvotes: 2
Views: 1438
Reputation: 496
Are you sure the extended ASCII char for "\xfe" is "þ"? I thought it worked this way for either "utf-8" or "latin1", if I'm not wrong.
Anyway, this is how I got the print you asked for:
x <- "\xfe"
Encoding(x)
#[1] "latin1"
l10n_info()
will give you your locale charset (in my case, LATIN-1 is locale):
l10n_info()
#$MBCS
#[1] FALSE
#$`UTF-8`
#[1] FALSE
#$`Latin-1`
#[1] TRUE
#$codepage
#[1] 1252
Since both x and the locale charset are latin1, print() will display correctly the character you want:
print(x)
#[1] "þ"
Upvotes: 0
Reputation: 121127
I can reproduce the issue on a Linux machine in a C locale
.Platform$OS.type
## [1] "unix"
Sys.getlocale()
## [1] "C"
'\xfe'
## [1] "\376"
On a Windows machine, it correctly prints a thorn, even when the locale is "C"
.
If I change the LC_CTYPE
part of the locale to something with a UTF-8 suffix, and use the \u
specification that Scott Chamberlain suggested, I can correctly print a thorn.
# Easy test, thorn is a common char: Icelandic
Sys.setlocale("LC_CTYPE", "is_IS.utf8")
'\u00FE'
## [1] "þ"
# Harder test, thorn is very rare: English
Sys.setlocale("LC_CTYPE", "en_GB.utf8")
'\u00FE'
## [1] "þ"
# Even harder test, thorn is unused: Arabic
Sys.setlocale("LC_CTYPE", "ar_QA.utf8")
'\u00FE'
## [1] "þ"
Upvotes: 0
Reputation: 5903
Did you try '\u00FE'
. Hopefully that isn't different in different locales
Upvotes: 1