An efficient data structure for evicting events before some time?

Question

I have a collection of objects consisting of an ID and a date. I want to store these objects in a way where I can efficiently look them up by ID, and also delete all events that occur before some point in time.

I was thinking about using a HashMap and TreeMap, where the HashMap holds IDs and the TreeMap stores the elements sorted by date. This gives O(1) lookup by ID and lets me efficiently remove all old events. I also tried using a sorted TreeMap of the dates without the hash table of IDs.

Is there an even more efficient data structure for storing information to efficiently support these operations?

templatetypedef · Accepted Answer

Given that the operations you want to support are

Insert,
Lookup by ID, and
Delete everything occurring before some time,

I think that you may want to use a data structure that's a combination of a chained hash table and a splay tree. The basic idea is as follows: the hash table maps ID values to the object in question, and the splay tree stores the objects sorted by date. To insert into the structure, you insert the element into both the ID hash table and splay tree in O(log n) amortized time. You can do lookups in O(1) expected time in the hash table.

Depending on the parameters of your problem, you can potentially delete all elements that appear before some time extremely efficiently. The idea is as follows. The splay tree efficiently (amortized O(log n)) supports deleting everything in the tree whose time is before some particular value. Now, if the entries you insert into this structure have the property that whenever you delete all values below some time, you never then insert an entry before that time, you can use the following deletion procedure: first, use the efficient algorithm for deleting all entries from the splay tree that need to be removed in total amortized time O(log n), then record the time that you just deleted. From that point forward, any time you do a lookup in the hash table, if you ever see an element whose time is below the given time threshold, you just delete it. If you do this during a rehash, then you can spread the O(n) work required to remove everything from the hash table that should be deleted to just one instant in time where it's required, and so can amortize the work. This still gives you O(1) lookup time by ID and O(1) amortized insertion time into the hash table. In short, if you can make this assumption, you'll get the following runtimes:

Insert: O(log n) amortized.
Lookup by ID: O(1) expected.
Delete all before time T: O(log n) amortized.

Hope this helps!

An efficient data structure for evicting events before some time?

Answers (1)

Related Questions