Reputation: 3620
require 'uri'
uri = URI.parse 'http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg'
The browsers have no problem with http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg so I'm asking myself if this ruby class is a little bit outdated? And should I completely renounce it or do some error handling…
Upvotes: 44
Views: 23287
Reputation: 2333
https://bibwild.wordpress.com/2023/02/14/escaping-encoding-uri-components-in-ruby-3-2/
TLDR;
Ruby 3.2
require 'cgi'
url = "https://example.com/some/#{ CGI.escapeURIComponent path_component }" +
"?#{CGI.escapeURIComponent my_key}=#{CGI.escapeURIComponent my_value}"
< Ruby 3.2
require 'cgi'
CGI.escape(input).gsub("+", "%20")
or
require 'erb'
ERB::Util.url_encode(input)
Upvotes: -1
Reputation: 4622
uri = URI.parse(URI.escape(url))
uri = URI.parse(URI::Parser.new.escape(url))
URI.escape
/ URI.encode
has been removed since Ruby 3.0. This solution offers to use pure Ruby uri
module rather than relaying on an third-party gem.
Upvotes: 14
Reputation: 32315
With kudus to all the URI.escape
answers (also known as URI.encode
), these methods have been officially made obsolete by Ruby 2.7 - i.e. they now produce a visible URI.escape is obsolete
warning message when you use them - previously they have just been deprecated. In Ruby 3.0 these methods have been completely removed and are no longer available at all - not even with a warning.
Unfortunately, as far as I can tell, the Ruby's standard library URI
class does not offer any alternative for handling URIs containing non-ASCII characters, which are all so common these days - <sarcasm>now that the web had gone international</sarcasm>.
The best solution I came up with is using the addressable gem that contains the URI
class we deserve - it handles everything the world has to throw at it and you can get an "HTTP safe" URI using the #display_uri
method:
Addressable::URI.parse("http://example.com/Оуэн-Мэтьюс.jpg")
=> #<Addressable::URI:0xc8 URI:http://example.com/Оуэн-Мэтьюс.jpg>
Addressable::URI.parse("http://example.com/Оуэн-Мэтьюс.jpg").display_uri.to_s
=> "http://example.com/%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"
Addressable::URI
also comes with all kinds of goodies, such as port inferral (you can tell whether the URL originally contained a port specification, or you can not care), and URL canonicalization (given a base URL, take a possibly relative URL and generate an absolute URL).
Here's how to use this with net/http
:
response = Net::HTTP.start(url.host, url.inferred_port,
:use_ssl => url.scheme == 'https') do |http|
req = Net::HTTP::Get.new(url.display_uri.request_uri)
end
Upvotes: 20
Reputation: 18064
You can map the URL characters and escape the ones that are not ASCII. Something like this:
url.chars.map { |char| char.ascii_only? ? char : CGI.escape(char) }.join
Upvotes: 4
Reputation: 908
What do you think about:
url = URI.escape(url) unless url.ascii_only?
URI.parse(url)
Upvotes: 9
Reputation: 847
I had the same error:
Ruby: URI::InvalidURIError (URI must be ascii only
with my code, but my bug was that it was an old project and the i18n was outdated. It was solved, with a simple:
bundle update
Upvotes: 1
Reputation: 3620
The answer just came to me by asking myself the question:
begin
uri = URI.parse(url)
rescue URI::InvalidURIError
uri = URI.parse(URI.escape(url))
end
Upvotes: 49