How to use a mutable array as dictionary value in a performant way?

Question

Is there a way to put mutable arrays as values to Swift dictionaries without performance losses? When I was searching around, most approaches seemed to suggest grabbing the array from the dictionary as a var, adding to it, and then setting it back to the dictionary (example). However, this seems to be very inefficient.

I benchmarked this approach against Objective-C and Swift with NSMutableArrays, and with 1M entries it seems to be 180x slower, and seems to be exhibiting non-linear growth, suggesting that the compiler is not able to sufficiently optimize this, either.

import Foundation
 
let startDate = Date()
var dict = [Int:[Int]]()
for i in 0..<1_000_000 {
    let key = i % 7
    var arr = dict[key] ?? [Int]()
    arr.append(i)
    dict[key] = arr
}
print("Time spent: \(Date().timeIntervalSinceReferenceDate - startDate.timeIntervalSinceReferenceDate)")
 
// $ swiftc MutableDictionary.swift -o MutableDictionarySwift -framework Foundation -O
// $ ./MutableDictionarySwift
// Time spent: 18.316431999206543

Objective-C:

#import 
 
int main(int argc, char *argv[]) {
    @autoreleasepool {
        NSDate *startDate = [NSDate date];
        NSMutableDictionary *dict = [NSMutableDictionary new];
        for (int i = 0; i < 1000000; ++i) {
            NSNumber *key = @(i % 7);
            NSMutableArray *arr = dict[key] ?: [NSMutableArray new];
            [arr addObject:@(i)];
            dict[key] = arr;
        }
        NSLog(@"Number of elements: %lu", dict.count);
        NSLog(@"Time spent: %f", [NSDate date].timeIntervalSinceReferenceDate - startDate.timeIntervalSinceReferenceDate);
    }
}
 
// $ clang MutableDictionary.m -o MutableDictionaryObjc -framework Foundation -O3
// $ ./MutableDictionaryObjc
// 2020-11-26 17:41:17.686 MutableDictionaryObjc[12012:775366] Number of elements: 7
// 2020-11-26 17:41:17.686 MutableDictionaryObjc[12012:775366] Time spent: 0.098124

Swift with NSMutableArray:

import Foundation
 
let startDate = Date()
var dict = [Int:NSMutableArray]()
for i in 0..<1_000_000 {
    let key = i % 7
    let arr = dict[key] ?? NSMutableArray()
    arr.add(i)
    dict[key] = arr
}
print("Time spent: \(Date().timeIntervalSinceReferenceDate - startDate.timeIntervalSinceReferenceDate)")
 
// $ swiftc MutableDictionary2.swift -o MutableDictionarySwift2 -framework Foundation -O  
// $ ./MutableDictionarySwift2 
// Time spent: 0.10528397560119629

Cristik · Accepted Answer

Note that both Swift array and NSMutableArray under the hood kinda do the same things: when they have no more allocated space for new items, they allocate a new buffer. And the new buffer allocation logic should be fairly similar in both Swift and Objective-C.

The problem with your Swift array code, as Alexander pointed out in his comment, is the fact that you create a local copy of the array that is stored into the dictionary. And as soon as you mutate that local array, the copy-on-write mechanism triggers a copy of the array buffer, which results into another heap allocation. And the bigger the array, the more it takes to duplicate the buffer.

The nonlinear growth you see is due to the fact that the array buffer increases don't occur that often, as the array implementations have a pretty good heuristic for the capacity increase, in order to avoid allocations every time a new element is added. However the buffer duplication due to copy-on-write happens at every iteration.

You will get better results with the Swift arrays if you mutate them in place:

for i in 0..<1_000_000 {
    let key = i % 7
    dict[key, default: [Int]()].append(i)
}

Both measurements and Instruments show a dramatically improved performance with the above change, with the Swift code being faster than the Objective-C one.

How to use a mutable array as dictionary value in a performant way?

Answers (1)

Related Questions