Rod Michael Coronel
Rod Michael Coronel

Reputation: 592

Extended ASCII in R

The extended ASCII character for '\xfe' is 'þ'.

However, when I try printing the character, I get:

> print('\xfe')
[1] "\376"

Is there a way to print 'þ'?

EDIT to add context as requested:

> getOption("encoding")
[1] "native.enc"
> l10n_info()
$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] FALSE

> Sys.getlocale()
[1] "C"

Upvotes: 2

Views: 1438

Answers (3)

Maria Wollestonecraft
Maria Wollestonecraft

Reputation: 496

Are you sure the extended ASCII char for "\xfe" is "þ"? I thought it worked this way for either "utf-8" or "latin1", if I'm not wrong.

Anyway, this is how I got the print you asked for:

x <- "\xfe"
Encoding(x)
#[1] "latin1"

l10n_info() will give you your locale charset (in my case, LATIN-1 is locale):

l10n_info()
#$MBCS
#[1] FALSE

#$`UTF-8`
#[1] FALSE

#$`Latin-1`
#[1] TRUE

#$codepage
#[1] 1252

Since both x and the locale charset are latin1, print() will display correctly the character you want:

print(x)
#[1] "þ"

Upvotes: 0

Richie Cotton
Richie Cotton

Reputation: 121127

I can reproduce the issue on a Linux machine in a C locale

.Platform$OS.type
## [1] "unix"
Sys.getlocale()
## [1] "C"
'\xfe'
## [1] "\376"

On a Windows machine, it correctly prints a thorn, even when the locale is "C".

If I change the LC_CTYPE part of the locale to something with a UTF-8 suffix, and use the \u specification that Scott Chamberlain suggested, I can correctly print a thorn.

# Easy test, thorn is a common char: Icelandic
Sys.setlocale("LC_CTYPE", "is_IS.utf8") 
'\u00FE'
## [1] "þ"

# Harder test, thorn is very rare: English
Sys.setlocale("LC_CTYPE", "en_GB.utf8") 
'\u00FE'
## [1] "þ"

# Even harder test, thorn is unused: Arabic
Sys.setlocale("LC_CTYPE", "ar_QA.utf8") 
'\u00FE'
## [1] "þ"

Upvotes: 0

sckott
sckott

Reputation: 5903

Did you try '\u00FE'. Hopefully that isn't different in different locales

Upvotes: 1

Related Questions