Reputation: 5095
When using the Net::HTTP
class (Module?), I seem to have a problem that even though the response sets the Content-Type header to have charset equal to ISO-8859-1, the response's encoding is ASCII-8BIT.
I am not 100% sure why these two encodings are different, or how they are different but what I do know is that only the ISO-8859-1 encoding will let me do a transcoding into UTF-8. To wit:
require 'net/http'
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
response = http.request request
end
response['Content-Type']
=> "text/html;charset=ISO-8859-1"
response.body.encoding
=> #<Encoding:ASCII-8BIT>
response.body.encode(Encoding::UTF_8)
Encoding::UndefinedConversionError: "\xE9" from ASCII-8BIT to UTF-8
What is going on here? If I force_encoding
the response's body to Encoding::ISO_8859_1
, then the transcoding works.
Is Net::HTTP
at fault?
Upvotes: 1
Views: 916
Reputation: 84124
Ruby does not set the encoding of the response automatically (see ticket) and will always set the encoding to ASCII-8BIT.
That is a slightly misleading encoding name since it actually means "arbitrary binary data". This is why you need to use force_encoding
to set the encoding before you can transcode to other encodings.
Upvotes: 2