There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not. I am looking to find out what is the correct version. Example string: "someone@example.com". If I go to https://www.urlencoder.org/ , and try to encode the above String I get someone%40example.com If I am using org.springframework.web.util.UriUtils I get these results: String s1 = UriUtils.encodePathSegment("someone@example.com", "UTF-8"); String s2 = UriUtils.encodeQueryParam("someone@example.com", "UTF-8"); String s3 = UriUtils.encodePath("someone@example.com", "UTF-8"); System.out.println("----------s1: " + s1); System.out.println("----------s2: " + s2); System.out.println("----------s3: " + s3); ...outputs ----------s1: someone@example.com ----------s2: someone@example.com ----------s3: someone@example.com RestEasy-Client v4.0.0.Final does not encode the "@" character in path segments WSO2 ESB complains when receiving a Path parameter that contains @ char (well, it can't find the resource at said moment). Who is right, what should be the correct outcome, should "@" be transformed to "%40" or not?

Reputation: 1447

URL encoding the character @ in query path

There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.

I am looking to find out what is the correct version. Example string: "[email protected]".

If I go to https://www.urlencoder.org/ , and try to encode the above String I get someone%40example.com
If I am using org.springframework.web.util.UriUtils I get these results:

String s1 = UriUtils.encodePathSegment("[email protected]", "UTF-8"); String s2 = UriUtils.encodeQueryParam("[email protected]", "UTF-8"); String s3 = UriUtils.encodePath("[email protected]", "UTF-8"); System.out.println("----------s1: " + s1); System.out.println("----------s2: " + s2); System.out.println("----------s3: " + s3);

...outputs

----------s1: [email protected]
----------s2: [email protected]
----------s3: [email protected]

RestEasy-Client v4.0.0.Final does not encode the "@" character in path segments
WSO2 ESB complains when receiving a Path parameter that contains @ char (well, it can't find the resource at said moment).

Who is right, what should be the correct outcome, should "@" be transformed to "%40" or not?

Upvotes: 0

Answers (2)

VoiceOfUnreason

Reputation: 57297

There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.

The standard for which characters must be escaped in a path segment is RFC 3986, Appendix A.

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

Notice that depending on the path production you are using, there are three different flavors of segment

segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
              ; non-zero-length segment without any colon ":"

but...

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

So @ is allowed in any path segment.

Is it required? As far as I can tell, the answer is no -- using the pct-encoded representation instead is permitted when @ is not serving the role of a delimiter. There's nothing explicit, but this observation about unreserved characters is a hint:

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.

This suggests that pct-encodings of unreserved characters are permitted, even though that's clearly not required. So that should hold true for other characters after the delimiters have been resolved.

For reference: the unreserved set is pretty much what you would expect.

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

Upvotes: 1

luckygulli

Reputation: 21

If you call an url like login(:password)@url.com, it will connect you to that endpoint with your credential. So I would not escape them at that point. But if they appear after the .com, I would escape them, because they should not be use as a separator.

Upvotes: 0

URL encoding the character @ in query path

Answers (2)

Related Questions