HRÓÐÓLFR
HRÓÐÓLFR

Reputation: 5982

How to URL encode a string in Ruby

How do I URI::encode a string like:

\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a

to get it in a format like:

%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A

as per RFC 1738?

Here's what I tried:

irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
    from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
    from (irb):123
    from /usr/local/bin/irb:12:in `<main>'

Also:

irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
    from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
    from (irb):126
    from /usr/local/bin/irb:12:in `<main>'

I looked all about the internet and haven't found a way to do this, although I am almost positive that the other day I did this without any trouble at all.

Upvotes: 177

Views: 219586

Answers (8)

kangkyu
kangkyu

Reputation: 6090

I was originally trying to escape special characters in a file name only, not on the path, from a full URL string.

ERB::Util.url_encode didn't work for my use:

helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"

Based on two answers in "https://stackoverflow.com/questions/34274838/why-is-uri-escape-marked-as-obsolete-and-where-is-this-regexpunsafe-constant", it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me:

URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"

UPDATE: I think it's from Ruby 3.0, URI.escape does not work any more. I have not found replacement except URI::Parser.new.escape yet.

Upvotes: 14

Jenner La Fave
Jenner La Fave

Reputation: 1427

Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:

>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"

>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"

CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.

See "What's the difference between URI.escape and CGI.escape?" for more discussion.

Upvotes: 121

kain
kain

Reputation: 5570

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str


=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

Upvotes: 208

If you want to "encode" a full URL without having to think about manually splitting it into its different parts, I found the following worked in the same way that I used to use URI.encode:

URI.parse(my_url).to_s

Upvotes: 2

foomip
foomip

Reputation: 574

I created a gem to make URI encoding stuff cleaner to use in your code. It takes care of binary encoding for you.

Run gem install uri-handler, then use:

require 'uri-handler'

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

It adds the URI conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use. By default it sets to encoding 'binary' if the straight UTF-8 encoding fails.

Upvotes: 6

Aleksey Shein
Aleksey Shein

Reputation: 7482

You can use Addressable::URI gem for that:

require 'addressable/uri'   
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a" 

It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in "The application/x-www-form-urlencoded type" on Wikipedia.

2.1.2 :008 > CGI.escape('Hello, this is me')
 => "Hello%2C+this+is+me" 
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
 => "Hello,%20this%20is%20me" 

Upvotes: 12

Thiago Falcao
Thiago Falcao

Reputation: 5003

Code:

str = "http://localhost/with spaces and spaces"
encoded = URI::encode(str)
puts encoded

Result:

http://localhost/with%20spaces%20and%20spaces

Upvotes: 8

Jared Beck
Jared Beck

Reputation: 17528

str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"

Taken from @J-Rou's comment

Upvotes: 75

Related Questions