Andrei
Andrei

Reputation: 1447

URL encoding the character @ in query path

There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.

I am looking to find out what is the correct version. Example string: "[email protected]".

...outputs

----------s1: [email protected]
----------s2: [email protected]
----------s3: [email protected]

Who is right, what should be the correct outcome, should "@" be transformed to "%40" or not?

Upvotes: 0

Views: 1085

Answers (2)

VoiceOfUnreason
VoiceOfUnreason

Reputation: 57297

There are places/libraries that seem to consider "@" characters in a URL Path segment as "special character" that should be encoded, and places/libraries that do not.

The standard for which characters must be escaped in a path segment is RFC 3986, Appendix A.

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

Notice that depending on the path production you are using, there are three different flavors of segment

segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
              ; non-zero-length segment without any colon ":"

but...

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

So @ is allowed in any path segment.

Is it required? As far as I can tell, the answer is no -- using the pct-encoded representation instead is permitted when @ is not serving the role of a delimiter. There's nothing explicit, but this observation about unreserved characters is a hint:

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.

This suggests that pct-encodings of unreserved characters are permitted, even though that's clearly not required. So that should hold true for other characters after the delimiters have been resolved.

For reference: the unreserved set is pretty much what you would expect.

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

Upvotes: 1

luckygulli
luckygulli

Reputation: 21

If you call an url like login(:password)@url.com, it will connect you to that endpoint with your credential. So I would not escape them at that point. But if they appear after the .com, I would escape them, because they should not be use as a separator.

Upvotes: 0

Related Questions