Reputation: 1070
I am getting Encoding::UndefinedConversionError at /find/Wrocław "\xC5" from ASCII-8BIT to UTF-8
For some mysterious reason sinatra is passing the string as ASCII instead of UTF-8 as it should.
I have found some kind of ugly workaround... I don't know why Rack assumes the encoding is ASCII-8BIT ... anyway, a way is to use string.force_encoding("UTF-8")... but doing this for all params is tedious
Upvotes: 3
Views: 2026
Reputation: 3534
AFAIK you are not supposed to have raw UTF-8 characters in URLs but must % encode them , not doing so will likely cause all kind of issues with say standard compliant proxies. It looks like it's not so much a Rack issue but a problem with the application emitting invalid URLs. The charset and encoding information in the HTTP header applies to the content not the header itself.
To quote RFC 3986
When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2".
Upvotes: 2
Reputation: 65232
I was having some similar problems with routing to "/protégés/:id". I posted to the Rack mailing list, but the response wasn't great.
The solution I came up with isn't perfect, but it works for most cases. First, create a middleware that unencodes the UTF-8:
# in lib/fix_unicode_urls_middleware.rb:
require 'cgi'
class FixUnicodeUrlsMiddleware
ENVIRONMENT_VARIABLES_TO_FIX = [
'PATH_INFO', 'REQUEST_PATH', 'REQUEST_URI'
]
def initialize(app)
@app = app
end
def call(env)
ENVIRONMENT_VARIABLES_TO_FIX.each do |var|
env[var] = CGI.unescape(env[var]) if env[var] =~ /%[A-Za-z0-9]/
end
@app.call(env)
end
end
Then use that middleware in your config/environment.rb
(Rails 2.3) or config/application.rb
(Rails 3).
You'll also have to ensure you've set the right encoding HTTP header:
Content-type: text/html; charset=utf-8
You can set that in Rails, in Rack, or in your web server, depending on how many different encodings you use on your site.
Upvotes: 3