user137369
user137369

Reputation: 5666

Simple alternative to URI.escape

While using URI.parse on a URL, I was confronted with an error message:

URI::InvalidURIError: URI must be ascii only

I found a StackOverflow question that recommended using URI.escape, which works. Using the URL in that question as an example:

URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> URI::InvalidURIError: URI must be ascii only "http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/\u041E\u0443\u044D\u043D-\u041C\u044D\u0442\u044C\u044E\u0441.jpg"

URI.encode('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> "http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"

However, URI.escape is obsolete, as Rubocop warns:

URI.escape method is obsolete and should not be used. Instead, use CGI.escape, URI.encode_www_form or URI.encode_www_form_component depending on your specific use case.

But while URI.escape gives us a usable result, the alternatives don’t:

CGI.escape('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
=> "http%3A%2F%2Fdxczjjuegupb.cloudfront.net%2Fwp-content%2Fuploads%2F2017%2F10%2F%D0%9E%D1%83%D1%8D%D0%BD-%D0%9C%D1%8D%D1%82%D1%8C%D1%8E%D1%81.jpg"

This is a bother because in my case I’m constructing a URL from data I get via Nokogiri:

my_url = page.at('.someclass').at('img').attr('src')

Since I only need to escape the last part of the resulting URL, but CGI.escape and similar transform the whole string (including necessary characters, such as : and /), getting the escaped result now becomes a multiple-lines-of-code ordeal, having to split the path and using several variables to achieve what could be previously done with a single function (URI.escape).

Is there a simple alternative I’m not seeing? It needs to be done without external gems.

Upvotes: 3

Views: 1348

Answers (1)

mu is too short
mu is too short

Reputation: 434606

I tend to use Addressable for parsing URLs since the standard URI has flaws:

require 'addressable'

Addressable::URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg')
#<Addressable::URI:0x3fc37ecc1c40 URI:http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg> 

Addressable::URI.parse('http://dxczjjuegupb.cloudfront.net/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg').path
# "/wp-content/uploads/2017/10/Оуэн-Мэтьюс.jpg" 

It isn't part of the Ruby core or the standard library but it should be and it always ends up in my Gemfiles.

Upvotes: 2

Related Questions