SaganRitual
SaganRitual

Reputation: 3203

Swift: intermittent, inconsistent, unexpected behavior using Set.insert()

Edit: I never thought to restart Xcode after deleting the derived data. Now everything works exactly as expected.

I'm running into intermittent failures in my app. I've narrowed it down to some strangeness with Set.insert(). Sometimes the insert results in a call to my == function, other times not, for no apparent reason. Here's the best stripped-down sample I can come up with; it runs in playground.

// Results are the same whether I use Equatable or not
struct ID: Hashable, Equatable {
    let idNumber: Int

    // Results are the same whether I implement this or not
    func hash(into hasher: inout Hasher) {
        hasher.combine(idNumber)
    }

    // Always return true; it doesn't matter. What matters
    // is that sometimes the Set.insert() doesn't even
    // call this function.
    static func == (_ lhs: ID, _ rhs: ID) -> Bool {
        print("(eq)", terminator: ""); return true
    }
}

let id0 = ID(idNumber: 0)
let id1 = ID(idNumber: 1)

var uniqueIDs = Set<ID>()

print("a", terminator: "")
uniqueIDs.insert(id0)
print("b", terminator: "")
uniqueIDs.insert(id1)
print("c", terminator: "")

If I run this ten times in the playground, about half the time I'll see eq in the output, and half the time not. That is, about half the time Set.insert() doesn't call my == before trying the insert.

I read up on Swift sets, and didn't find anything that would shed any light. I sort of thought that if this were the intended behavior, it would be documented with a big warning sign attached. The lack of such warnings suggests that I'm misusing Sets, but I don't know what I'm doing wrong. What doc, or which SO answer, have I missed?

Upvotes: 0

Views: 184

Answers (2)

Alexander
Alexander

Reputation: 63341

Set has no reason to call == on your type if there are not value collisions. I this is a red herring.

Set calls hash(into hasher: inout Hasher) on your values, and then takes the modulo of the set's internal array size. The result is an index at which the value (if it already exists in the set) should be. Naturally, this process makes it possible for multiple values, after hashing and taking modulo, to end up in the same array slot.

To compensate for this, rather than the elements being stored directly in the array slots, they're stored by a linked list. Conceptually, the items within the same slot are called a "bucket". When looking up an element Set uses the hash value to find the right bucket, but needs to iterate through the linked list to find the exact element. At this point, hashes are no longer useful for identitying elements, so Set uses == checks until it finds the right match. This is usually pretty efficient, because the Set should be making the array large enough that buckets are small and contain very few collisions.

Because finding an element within a bucket is O(N), if you can force many hash collisions, then you can force Set's O(1) insert/remove/check operations to degenerate into O(N) traversals over the entire Set's elements (since you could make all elements map to a single bucket. To counter DOS vulnerability, modern associative data structures uses a "seed" that is randomly chosen every time the app is run, and use that to scramble the hashes. This way, it becomes very difficult to craft payloads with identical hash values (which would cause the over-sized bucket issue.) That's where your non-determinism is coming from. See PSA: The stdlib now uses randomly seeded hash values.

Fundamentally, Set<T> is really just a Dictionary of type [T: Void]. So if you read into how hash-based associative data structures (other common names are as Dictionaries, Hashes, HashMaps, etc.) work, much of the same applies.

Upvotes: 2

Evgeniy Kleban
Evgeniy Kleban

Reputation: 6965

Look like playground somehow broken. That's because, the same code i past in blank app project have different output that the same code in playground. Please take a look:

@UIApplicationMain
class AppDelegate: UIResponder, UIApplicationDelegate {

    var window: UIWindow?


    func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        // Override point for customization after application launch.
        doSmth()
        return true
    }

    func doSmth(){


        // Results are the same whether I use Equatable or not
        struct ID: Hashable, Equatable {
            let idNumber: Int

            // Results are the same whether I implement this or not
            func hash(into hasher: inout Hasher) {
                hasher.combine(idNumber)
            }

            // Always return true; it doesn't matter. What matters
            // is that sometimes the Set.insert() doesn't even
            // call this function.
            static func == (_ lhs: ID, _ rhs: ID) -> Bool {
                // print("(eq)", terminator: "");
                // return false
                return true
            }
        }

        let id0 = ID(idNumber: 0)
        let id1 = ID(idNumber: 1)
        let id2 = ID(idNumber: 2)

        var uniqueIDs = Set<ID>([id0, id1])

        uniqueIDs.insert(id0)
        uniqueIDs.insert(id1)
        uniqueIDs.insert(id2)
        uniqueIDs.insert(id0)
        uniqueIDs.insert(id1)

        uniqueIDs.forEach{ value in
            print("value - \(value)")
        } 
    }
}

It prints out:

value - ID(idNumber: 0)
value - ID(idNumber: 1)
value - ID(idNumber: 2)

No duplicates (even when i tried to add duplicates).

Upvotes: 0

Related Questions