Chen Xiaofeng
Chen Xiaofeng

Reputation: 516

iOS 17 NSURL percentage gets double encoded

in iOS 17 NSURL parsing changed to use RFC 3986 from RFC 1738/1808 (https://developer.apple.com/documentation/foundation/nsurl/1572047-urlwithstring). Based on RFC 3986 guidance , the reserved character contains the gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" and sub-delims =

"!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

in which gen-delims are required to be encoded and sub-delims are optional based on applications . iOS 17 seems to only encode gen-delims but our application requires sub-delims to be encoded as well in our Macro expansion logic. This is not a problem before iOS 17 since we rely on our own encoding logic for Macro value

[str stringByAddingPercentEncodingWithAllowedCharacters:[[NSCharacterSet characterSetWithCharactersInString:@" !*'\"();:@&=+$,/?%#[]^|{}\\`"] invertedSet]];

But somehow it's causing issues now that all percentage gets double encoded .

For example the following url we want to expand the Macro [PARTNER] to abc/dev

http://google.com/ad/1?asseturl=[ASSETURI]&partner=[PARTNER] 
//Macro expansion only for PARTNER because don't have value for ASSETURI
http://google.com/ad/1?asseturl=[ASSETURI]&partner=abc/dev
//encode the Macro value with above method
http://google.com/ad/1?asseturl=[ASSETURI]&partner=abc%2Fdev
//String to NSURL [NSURL URLWithString:url];
http://google.com/ad/1?asseturl=%5BASSETURI%5D&partner=abc%25%2Fdev

You can see there's an additional %25 in the url because percentage got double encoded in IOS 17 , it doesn't exist before iOS 17.

so why will NSURL encode the percentage (Although it seems like NSURL only do this when there's an invalid character)? Is this a bug because there doesn't seem to be a requirement from RFC 3986 to encode percentage

Upvotes: 2

Views: 1088

Answers (1)

Codo
Codo

Reputation: 78975

The biggest misconception about URL encoding is that it encodes URLs. It does not. It's for path components and query parameters only.

Encoding an entire URL cannot work. If it is already an URL, it needs no encoding. If it's not an URL, then how should it be treated? Which parts are to be treated as the hostname, path and query parameters?

That's why the behavior of NSURL URLWithString: is problematic. According to the documentation:

NSURL automatically percent- and IDNA-encodes invalid characters to help create a valid URL.

So it tries to fix an invalid URL. But since the URL is invalid, this cannot reliably work.

For your case, it does not work. You feed it this invalid URL:

http://google.com/ad/1?asseturl=[ASSETURI]&partner=abc%2Fdev

The query parameter asseturl has an invalid value (square brackets need encoding) while partner has a value that could be valid or invalid. It's basically a guess if the value of partner needs to be URL encoded or not. This is were Apple's implementation has changed.

It's the invalid asseturl parameter triggering the double encoding. Without the invalid parameter, the value of partner is not encoded a second time.

The proper way in any programming language and with any framework or library is to encode each path component and each query value separately.

Using this approach, a valid URL is built in the first place and then passed to NSURL URLWithString:. And it won't be double encoded.

NSString* partner = [@"abc/dev" stringByAddingPercentEncodingWithAllowedCharacters:[[NSCharacterSet characterSetWithCharactersInString:@" !*'\"();:@&=+$,/?%#[]^|{}\\`"] invertedSet]];
NSString* asseturi = [@"[ASSETURI]" stringByAddingPercentEncodingWithAllowedCharacters:[[NSCharacterSet characterSetWithCharactersInString:@" !*'\"();:@&=+$,/?%#[]^|{}\\`"] invertedSet]];
NSURL* url = [NSURL URLWithString: [NSString stringWithFormat:@"http://google.com/ad/1?asseturl=%@&partner=%@", asseturi, partner]];

The result is:

http://google.com/ad/1?asseturl=%5BASSETURI%5D&partner=abc%2Fdev

NSURLComponents and NSURLQueryItem are usually the best approach to construct URLs with query parameters. They take care of encoding. But since you have the additional requirement of encoding the sub-delims class, they might not be the best fit.

Upvotes: 4

Related Questions