QHash: Any weak spots performance-wise, besides rehashing? Why not?

Question

This is more of a theoretical question. In addition, I have to admit, that I did some rather sophisticated performance tests about this some time ago, but I cannot find the source code anymore.

Some words about the type of application: I am focussing on scenarios with really big numbers of entries, from 50,000 up to some million, while memory consumption does not really matter.

I fully understand the basic concept of a hash data structure, why it generally has constant access times, and why rehashing is required at some point.

Any possible key is mapped to a certain slot in the hash structure. Of course, collisions may occur, resulting in multiple keys being mapped to the same slot. This is where implementation details come into play. As far as I know, there is some kind of logic using the "next" slot if the initially assigned slot is occupied.

My feeling is, there has to be a weak spot somewhere, performance-wise. Given a really big QHash, filled up just below its capacity, of which keys are then removed randomly, while new keys are added (without ever increaing the total number of stored keys, making sure it does not rehash): I would think, this has to lead to severe performance degration in the long term.

Filling up a QHash just below its capacity, with random values, should result in a considerable amount of key collisions. The lookup of a key affected from collisions requires multiple slots to be inspected, resulting in performance penalties. Removing keys and adding new random keys instead should make things even worse: Contiguous sequences of colliding keys will be fragmented. Collisions occupy 'foreign' slots, forcing a key actually mapped to this slot to be stored somewhere else. This slot might still be free'd later...

To make it short, I would expect, for the given scenario (perfoming deletes and inserts on a QHash which is always kept just below its capacity), performance should degrade in the long run, either because of lookup times are increasing due to increasing disorder, or because of periodical reordering.

However, I had taken some efforts to show this performance degration once (as I said, I cannot find this project anymore, so I stand here barehanded, I'm afraid), and I could not find any.

Is there a special magic in place regarding QHash handling collisions I am not aware of?

QHash: Any weak spots performance-wise, besides rehashing? Why not?

Answers (1)

Related Questions