Reputation: 1249
Why does converting a String
to an URL
in Swift 4.2 and then converting the URL
back to a String
using url.path
change the encoding of special characters like german umlauts (ä, ö, ü), even if I use a utf-8 encoding?
I wrote some sample code to show my problem. I encoded the strings to base64 in order to show that there is a difference.
I also have a similar unsolved problem with special characters and swift here.
let string = "/path/to/file"
let stringUmlauts = "/path/to/file/with/umlauts/testäöü"
let base64 = Data(string.utf8).base64EncodedString()
let base64Umlauts = Data(stringUmlauts.utf8).base64EncodedString()
print(base64, base64Umlauts)
let url = URL(fileURLWithPath: string)
let urlUmlauts = URL(fileURLWithPath: stringUmlauts)
let base64Url = Data(url.path.utf8).base64EncodedString()
let base64UrlUmlauts = Data(urlUmlauts.path.utf8).base64EncodedString()
print(base64Url, base64UrlUmlauts)
The base64
and base64Url
string stay the same but the base64Umlauts
and the base64UrlUmlauts
are different.
"L3BhdGgvdG8vZmlsZQ==" for
base64
"L3BhdGgvdG8vZmlsZQ==" for
base64Url
"L3BhdGgvdG8vZmlsZS93aXRoL3VtbGF1dHMvdGVzdMOkw7bDvA==" for
base64Umlauts
"L3BhdGgvdG8vZmlsZS93aXRoL3VtbGF1dHMvdGVzdGHMiG/MiHXMiA==" for
base64UrlUmlauts
When I put the base64Umlauts
and base64UrlUmlauts
strings into an online Base64 decoder, they both show /path/to/file/with/umlauts/testäöü
, but the ä, ö, ü
are different (not visually).
Upvotes: 3
Views: 2120
Reputation: 318774
stringUmlauts.utf8
uses the Unicode characters äöü
.
But urlUmlauts.path.utf8
uses the Unicode characters aou
each followed by the combining ¨
.
This is why you get different base64 encoding - the characters look the same but are actually encoded differently.
What's really interesting is that Array(stringUmlauts)
and Array(urlUmlauts.path)
are the same. The difference doesn't appear until you perform the UTF-8 encoding of the otherwise exact same String
values.
Since the base64 encoding is irrelevant, here's a more concise test:
let stringUmlauts = "/path/to/file/with/umlauts/testäöü"
let urlUmlauts = URL(fileURLWithPath: stringUmlauts)
print(stringUmlauts, urlUmlauts.path) // Show the same
let rawStr = stringUmlauts
let urlStr = urlUmlauts.path
print(rawStr == urlStr) // true
print(Array(rawStr) == Array(urlStr)) // true
print(Array(rawStr.utf8) == Array(urlStr.utf8)) // false!!!
So how is the UTF-8 encoding of two equal strings different?
One solution to this is to use precomposedStringWithCanonicalMapping
on the result of path
.
let urlStr = urlUmlauts.path.precomposedStringWithCanonicalMapping
Now you get true
from:
print(Array(rawStr.utf8) == Array(urlStr.utf8)) // now true
Upvotes: 4