Satachito
Satachito

Reputation: 5888

How to read collapsed UTF-8 string

I'm trying to read JSON from this address

http://www.defense.gov/data.json'

But this data contains illegal character 0x92 in its 2771th line, so

try! String( contentsOf: URL( string: "http://www.defense.gov/data.json" )!, encoding: .utf8 )

crashes with exception below.

fatal error: 'try!' expression unexpectedly raised an error: Error Domain=NSCocoaErrorDomain Code=261 "The file “data.json” couldn’t be opened using text encoding Unicode (UTF-8)." UserInfo={NSURL=http://www.defense.gov/data.json, NSStringEncoding=4}: file /Library/Caches/com.apple.xbs/Sources/swiftlang/

Is there any way to read JSON from this site without writing my own string reader ?

Upvotes: 0

Views: 197

Answers (1)

OOPer
OOPer

Reputation: 47896

Seeing the content, all characters other than that single 0x92 are in ASCII range (0x00...0x7F). So, you may try using encoding ISO-8859-1 (alias ISO-Latin-1), which maps all bytes to U+0000...U+00FF, so may not cause encoding issue.

var rawStr = try! String(contentsOf: URL(string: "http://www.defense.gov/data.json")!, encoding: .isoLatin1)

You can remove that character if needed.

rawStr = rawStr.replacingOccurrences(of: "\u{92}", with: "")

And re-encode it as a valid UTF-8 data:

let dataUTF8 = rawStr.data(using: .utf8)!

Re-encoded data can be processed with JSONSerialization:

let json = try! JSONSerialization.jsonObject(with: dataUTF8) as! [String: Any]

All codes above are written for experimental purpose. All try!, as! or forced unwrapping (!) are not safe here and you may need to handle them in more nil-safe manner in actual app. And String(contentsOf:) may consume indefinite time, especially in bad-communication environment. You should not call it in the main thread in actual app.

Upvotes: 1

Related Questions