Kulgar
Kulgar

Reputation: 1865

Ruby - how to encode URL without re-encoding already encoded characters

I have a simple problem: users can post urls through specific input in a form in my website. I would like to encode the posted url, because sometimes users send urls with strange and/or non ascii characters (like é à ç...). For instance: https://www.example.com/url-déjà-vu

So I tried to use URI.escape('https://www.example.com/url-déjà-vu') which does work, but then if you have the following url: URI.escape('https://somesite.com/page?stuff=stuff&%20') you get: => "https://somesite.com/page?stuff=stuff&%2520"

The % character is encoded and should not be as %20 is already an encoded character. Then I thought I could do this:

URI.escape(URI.decode('https://somesite.com/page?stuff=stuff&%20'))
=> "https://somesite.com/page?stuff=stuff&%20"

But there is a problem if you have a "/" encoded in your url, for instance:

URI.escape(URI.decode('http://example.com/a%2fb'))
=> "http://example.com/a/b"

The "/" should stay encoded.

So... putting it all together: I want to encode urls posted by users but leaving already encoded characters unchanged in ruby. Any idea how I may do that without getting an headache?

Thanks :)

Upvotes: 4

Views: 2804

Answers (1)

Jordan Running
Jordan Running

Reputation: 106027

I can't think of a way to do this that isn't a little bit of a kludge. So I propose a little bit of a kludge.

URI.escape appears to work the way you want in all cases except when characters are already encoded. With that in mind we can take the result of URI.encode and use String#gsub to "un-encode" only those characters.

The below regular expression looks for %25 (an encoded %) followed by two hex digits, turning e.g. %252f back into %2f:

require "uri"

DOUBLE_ESCAPED_EXPR = /%25([0-9a-f]{2})/i

def escape_uri(uri)
  URI.encode(uri).gsub(DOUBLE_ESCAPED_EXPR, '%\1')
end

puts escape_uri("https://www.example.com/url-déjà-vu")
# => https://www.example.com/url-d%C3%A9j%C3%A0-vu

puts escape_uri("https://somesite.com/page?stuff=stuff&%20")
# => https://somesite.com/page?stuff=stuff&%20

puts escape_uri("http://example.com/a%2fb")
# => http://example.com/a%2fb

I don't promise that this is foolproof, but hopefully it helps.

Upvotes: 6

Related Questions