Reputation: 1447
There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.
I am looking to find out what is the correct version. Example string: "[email protected]".
If I am using org.springframework.web.util.UriUtils I get these results:
String s1 = UriUtils.encodePathSegment("[email protected]", "UTF-8");
String s2 = UriUtils.encodeQueryParam("[email protected]", "UTF-8");
String s3 = UriUtils.encodePath("[email protected]", "UTF-8");
System.out.println("----------s1: " + s1);
System.out.println("----------s2: " + s2);
System.out.println("----------s3: " + s3);
...outputs
----------s1: [email protected]
----------s2: [email protected]
----------s3: [email protected]
Who is right, what should be the correct outcome, should "@" be transformed to "%40" or not?
Upvotes: 0
Views: 1085
Reputation: 57297
There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.
The standard for which characters must be escaped in a path segment is RFC 3986, Appendix A.
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
Notice that depending on the path production you are using, there are three different flavors of segment
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
; non-zero-length segment without any colon ":"
but...
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
So @
is allowed in any path segment.
Is it required? As far as I can tell, the answer is no -- using the pct-encoded representation instead is permitted when @
is not serving the role of a delimiter. There's nothing explicit, but this observation about unreserved characters is a hint:
When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.
This suggests that pct-encodings of unreserved characters are permitted, even though that's clearly not required. So that should hold true for other characters after the delimiters have been resolved.
For reference: the unreserved set is pretty much what you would expect.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Upvotes: 1
Reputation: 21
If you call an url like login(:password)@url.com
, it will connect you to that endpoint with your credential. So I would not escape them at that point. But if they appear after the .com
, I would escape them, because they should not be use as a separator.
Upvotes: 0