Reputation: 1819
I've got a stream with > 20 millions of values which come with their corresponding key (> 10 millions). The keys are linked to one or more values (max 50000), example:
... (key1, val1), (key2,val2), (key1, val3), (key2, val4), (key1, val6), (key3,val5)...
I store this stream as follows:
key1 : val1, val3, val6
key2 : val2, val4
key3 : val5
Each time I receive a new value in the stream, I first check if this value appears in the list of its corresponding key:
My question is: what's the more efficient data structure or tools to perform this process (I want to launch the flag the faster possible). I thought of a hash table associated with linked list (as I give in the example), but checking all the linked list each time I add a value does not sound right. Recall that I do need this notion of LAST value.
Thank you
Upvotes: 1
Views: 41
Reputation: 93020
Checking if the new value is in the list is not optimal - it takes O(n)
time to check.
You can use a hashtable instead. You can store the last value separately and update it on insert.
So you have a hashtable, where the values are pairs. Each pair consists of a hashtable (used as a set) and an element (the last element in the set).
Your example looks like this:
(key1 -> (val6, (val1->1, val3->1, val6->1))
(key2 -> (val4, (val2->1, val4->1)
(key3 -> (val5, (val5->1))
You can optimize the cases when the set only contains one element, by not storing the last value explicitly.
Upvotes: 2