sameers
sameers

Reputation: 5095

Is the Net::HTTP Ruby gem ignoring the Content-type header in my HTTP responses?

When using the Net::HTTP class (Module?), I seem to have a problem that even though the response sets the Content-Type header to have charset equal to ISO-8859-1, the response's encoding is ASCII-8BIT.

I am not 100% sure why these two encodings are different, or how they are different but what I do know is that only the ISO-8859-1 encoding will let me do a transcoding into UTF-8. To wit:

require 'net/http'
 Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new uri
  response = http.request request
end
response['Content-Type']
 => "text/html;charset=ISO-8859-1"
response.body.encoding
 => #<Encoding:ASCII-8BIT>
response.body.encode(Encoding::UTF_8)
Encoding::UndefinedConversionError: "\xE9" from ASCII-8BIT to UTF-8

What is going on here? If I force_encoding the response's body to Encoding::ISO_8859_1, then the transcoding works.

Is Net::HTTP at fault?

Upvotes: 1

Views: 916

Answers (1)

Frederick Cheung
Frederick Cheung

Reputation: 84124

Ruby does not set the encoding of the response automatically (see ticket) and will always set the encoding to ASCII-8BIT.

That is a slightly misleading encoding name since it actually means "arbitrary binary data". This is why you need to use force_encoding to set the encoding before you can transcode to other encodings.

Upvotes: 2

Related Questions