Reputation: 2109
I have a JSON file (just an array of dicts), 60 megabytes large. In PHP parsing takes 2 seconds, but in Swift it's SEVEN seconds long. This is ridiculous. Is that me doing something wrong or what? Swift code:
let json = try! JSONSerialization.jsonObject(
with: try! Data(
contentsOf: URL(
fileURLWithPath: "/some/file/path/to.json"
)
)
) as! [[AnyHashable: Any]]
I simplified the code so it's one operation, but the slow part is JSONSerialization.jsonObject, I explicitly measured it (loading Data from file is fast as expected). PHP code is pretty straightforward - json_decode(file_get_contents())
.
It's worth mentioning that building in release mode (with optimizations) didn't improve the situation.
UPD: After profiling the app, I discovered that the bottleneck is casting the result to [[AnyHashable: Any]], changing it to [[String: Any]] improved the situation a little bit (from 7 seconds to ~5.3), but it's still shame and pain.
So basically the question now is: why is casting so slow and is there a way of working with large JSON objects (or any other serialized data) faster?
Upvotes: 1
Views: 1975
Reputation: 299355
I'm not going to judge you about encoding 60MB in JSON… ok, I'm going to judge you a little bit. That's a crazy format to store this much data. Got that out of my system; let's work on making it faster.
First, can you skip straight to Swift 4? If so, get rid of JSONSerialization
and go straight to the new JSONDecoder
. It avoids a lot of type problems. That said, it may or may not be any faster.
Let's get to the "why is casting so slow" question. Simple. Casting is fast. You're not casting. You're converting. AnyHashable
is a type-eraser; it's a completely different struct type than a String
:
public struct AnyHashable {
You have to box a String
into an AnyHashable
struct. That's pretty fast actually (because of how copy on write works), but it means the dictionary is a completely different dictionary. You're forcing it to make a complete copy.
The way I have historically handled massive JSON arrays is to parse them partially by hand. Throw away the first [
, collect a single JSON object at a time, parse it, and then put the result onto an Array. That way you never have to pull all of the data into memory and you don't need to burn 600MB of high water mark. This technique obviously works best if you have some control over the input JSON. For example, I usually cheat a little and write the JSON like this:
[
{ ... JSON ... },
{ ... JSON ... }
]
That makes it really fast and easy to parse the records (just split on newlines). (I happen to love this also because it's friendly to commandline tools like grep and awk with no JSON parsing at all). It's still legal JSON, but with a little special knowledge I can parse it much faster.
For benchmarking, I also recommend you build this in ObjC to separate NSJSONSerialization
from "bridging ObjC types to Swift." NSJSONSerialization
is generally considered a pretty fast parser. Bridging to Swift is expensive if you're not very careful (as discussed above). (I love Swift, but it is a very difficult language to reason about performance in.)
It looks like there's another player in this space called JASON
, but I haven't tried it yet. (There used to be a very famous package called JSONKit
that was insanely fast by playing ObjC tricks that would make your skin crawl but amazingly worked incredibly well and so must be forgive. But those tricks finally caught up with it, and I don't think it even works anymore.)
Upvotes: 3