Daniel Åkesson
Daniel Åkesson

Reputation: 1360

WKWebView load webpage with special characters

I've got a wkwebview that works as a browser. I can't manage to load addresses with special characters such as "http://www.håbo.se" (swedish character).

I'm using:

parsedUrl = [parsedUrl stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

which is promising as it creates an address that looks like follows: http://www.h%c3%a5bo.se/

If I enter that in Chrome it works. But when I try to load it in the wkwebview i get the following (I can load all other pages):

Here's the full NSError printed

Error Domain=NSURLErrorDomain Code=-1003 "A server with the specified hostname could not be found." UserInfo={_WKRecoveryAttempterErrorKey=<WKReloadFrameErrorRecoveryAttempter: 0x7f82ca502290>, NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, NSUnderlyingError=0x7f82ca692200 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "A server with the specified hostname could not be found." UserInfo={NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, _kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12, NSLocalizedDescription=A server with the specified hostname could not be found.}}, 

Upvotes: 4

Views: 2895

Answers (1)

Borys Verebskyi
Borys Verebskyi

Reputation: 4268

This one is complicated. From this article:

Resolving a domain name

If the string that represents the domain name is not in Unicode, the user agent converts the string to Unicode. It then performs some normalization functions on the string to eliminate ambiguities that may exist in Unicode encoded text.

Normalization involves such things as converting uppercase characters to lowercase, reducing alternative representations (eg. converting half-width kana to full), eliminating prohibited characters (eg. spaces), etc.

Next, the user agent converts each of the labels (ie. pieces of text between dots) in the Unicode string to a punycode representation. A special marker ('xn--') is added to the beginning of each label containing non-ASCII characters to show that the label was not originally ASCII. The end result is not very user friendly, but accurately represents the original string of characters while using only the characters that were previously allowed for domain names.

For example, following domain name:

JP納豆.例.jp

converts to next representation:

xn--jp-cd2fp15c.xn--fsq.jp

You can use following code to perform this conversion.

Resolving a path

If the string is input by the user or stored in a non-Unicode encoding, it is converted to Unicode, normalized using Unicode Normalization Form C, and encoded using the UTF-8 encoding.

The user agent then converts the non-ASCII bytes to percent-escapes.

For example, following path:

/dir1/引き割り.html

converts to next representation:

/dir1/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html

For this purpose, you may use following code:

path = [URL.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];

Note that stringByAddingPercentEscapesUsingEncoding: is deprecated, because each URL component or subcomponent has different rules for what characters are valid.

Putting it all together

Resulting code:

@implementation NSURL (Normalization)

- (NSURL*)normalizedURL {
    NSURLComponents *components = [NSURLComponents componentsWithURL:self resolvingAgainstBaseURL:YES];
    components.host = [components.host IDNAEncodedString]; // from https://github.com/OnionBrowser/iOS-OnionBrowser/blob/master/OnionBrowser/NSStringPunycodeAdditions.h
    components.path = [components.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
    return components.URL;
}

@end

Unfortunately, actual URL "normalization" is more complicated - you need to handle all remaining URL components too. But I hope I've answered your question.

Upvotes: 2

Related Questions