Reputation: 29895
I am using Ruby 2.3:
I have the following string: "\xFF\xFE"
I do a File.binread()
on a file containing it, so the encoding of this string is ASCII-8BIT
. However, in my code, i check to see whether this string was indeed read by comparing it to the literal string "\xFF\xFE"
(which has encoding UTF-8
as all Ruby strings have by default).
However, the comparison returns false
, even though both strings contain the same bytes - it just happens that one is with encoding ASCII-8BIT
and the other is UTF-8
I have two questions: (1) why does it return false
? and (2) what is the best way to go about achieving what i want? I just want to check whether the string I read matches "\xFF\xFE"
Upvotes: 3
Views: 1200
Reputation: 114178
(1) why does it return
false
?
When comparing strings, they either have to be in the same encoding or their characters must be encodable in US-ASCII.
Comparison works as expected if the string only contains byte values 0 to 127: (0b0xxxxxxx
)
a = 'E'.encode('ISO8859-1') #=> "E"
b = 'E'.encode('ISO8859-15') #=> "E"
a.bytes #=> [69]
b.bytes #=> [69]
a == b #=> true
And fails if it contains any byte values 128 to 255: (0b1xxxxxxx
)
a = 'É'.encode('ISO8859-1') #=> "\xC9"
b = 'É'.encode('ISO8859-15') #=> "\xC9"
a.bytes #=> [201]
b.bytes #=> [201]
a == b #=> false
Your string can't be represented in US-ASCII, because both its bytes are outside its range:
"\xFF\xFE".bytes #=> [255, 254]
Attempting to convert it doesn't produce any meaningful result:
"\xFF\xFE".encode('US-ASCII', 'ASCII-8BIT', :undef => :replace)
#=> "??"
The string will therefore return false
when being compared to a string in another encoding, regardless of its content.
(2) what is the best way to go about achieving what i want?
You could compare your string to a string with the same encoding. binread
returns a string in ASCII-8BIT
encoding, so you could use b
to create a compatible one:
IO.binread('your_file', 2) == "\xFF\xFE".b
or you could compare its bytes
:
IO.binread('your_file', 2).bytes == [0xFF, 0xFE]
Upvotes: 5