Reputation: 5666
While using URI.parse
on a URL, I was confronted with an error message:
URI::InvalidURIError: URI must be ascii only
I found a StackOverflow question that recommended using URI.escape
, which works. Using the URL in that question as an example:
URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> URI::InvalidURIError: URI must be ascii only "http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/\u041E\u0443\u044D\u043D-\u041C\u044D\u0442\u044C\u044E\u0441.jpg"
URI.encode('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> "http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"
However, URI.escape
is obsolete, as Rubocop warns:
URI.escape method is obsolete and should not be used. Instead, use CGI.escape, URI.encode_www_form or URI.encode_www_form_component depending on your specific use case.
But while URI.escape
gives us a usable result, the alternatives don’t:
CGI.escape('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> "http%3A%2F%2Fdxczjjuegupb.cloudfront.net%2Fwp-content%2Fuploads%2F2017%2F10%2F%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"
This is a bother because in my case I’m constructing a URL from data I get via Nokogiri:
my_url = page.at('.someclass').at('img').attr('src')
Since I only need to escape the last part of the resulting URL, but CGI.escape
and similar transform the whole string (including necessary characters, such as :
and /
), getting the escaped result now becomes a multiple-lines-of-code ordeal, having to split the path and using several variables to achieve what could be previously done with a single function (URI.escape
).
Is there a simple alternative I’m not seeing? It needs to be done without external gems.
Upvotes: 3
Views: 1348
Reputation: 434606
I tend to use Addressable
for parsing URLs since the standard URI
has flaws:
require 'addressable'
Addressable::URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
#<Addressable::URI:0x3fc37ecc1c40 URI:http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg>
Addressable::URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg').path
# "/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg"
It isn't part of the Ruby core or the standard library but it should be and it always ends up in my Gemfile
s.
Upvotes: 2