bugs king
bugs king

Reputation: 564

What's the most efficient data structure for store huge number of sessions?

For every client, the server creates a session for that specific client. A session has a expire time of 1 day. So that will end up with up to billion of sessions.

Suppose I use a hash map, then look up will be fast when a client communicates with server. However, I need to erase those expired sessions, for example once an hour. During the erase, then it may take some time due to the huge number, and this will cause server not being able to handle communication from the client.

So are there any high performance solution for this? i.e. I don't want to lock the map for erasing expired ones.

Upvotes: 2

Views: 871

Answers (4)

Jim Mischel
Jim Mischel

Reputation: 134035

One way to handle this is to create a hash map to hold the sessions, and a MRU (most recently used) list. The MRU list is implemented as a doubly-linked list. Whenever a user accesses the site, his session is moved back to the top of the MRU list. Also, whenever a session is created, the system checks the last item in the MRU list to see if the oldest session has expired, so that you can delete it.

Or, you could delete all of the expired sessions at the end of the list.

In addition, you'll want to have your lookup code delete an expired session if it hasn't already been deleted.

So, when you get a request, the sequence of events looks something like this:

session = get session info from user token
if no session
    create session
    add to front of MRU list
else if session has expired
    delete from mru list
    remove from hash map
else // session has not expired
    move session to front of MRU list
end

// delete expired sessions
p = last item in MRU list
while p has expired
    prev = p->prev
    remove from MRU list
    delete from hash map
    p = prev
end

If you're worried that cleaning up expired sessions will lock your hash map for too long, set a limit on the number of expired sessions you'll remove at any one time. If you set it to only clean up two expired sessions when a new session is added, you'll minimize the amount of time your data structure is locked, and expired sessions won't linger too long.

Upvotes: 0

rici
rici

Reputation: 241861

Simple solution: use a hash table. When you are searching a bucket for an entry, delete any expired sessions you come across. This is almost free, since you are searching the chain anyway. It doesn't guarantee that sessions will be deleted immediately on expiry, but it is highly probable that the chain containing an expired session will be searched not long afterwards.

You should presize the hashtable to a fixed number of buckets representing what you expect to be the capacity of the server. That avoids the need to rehash, and that means that each bucket chain can be independently locked. You don't need a lock for every chain, though; you can use the same lock for several -- even many -- chains. Choose a number of locks sufficient that your expected lock contention will be low under peak request pressure; you can compute a good number based on the number of simultaneously active handler threads you have. A chain search will take very little time if the chain is memory-resident, so it will almost always complete before a context-switch. So "simultaneously active" means that they are actually mapped to a CPU and running, not just mapped to a kernel process. So with even a small vector of locks, you should be able to reduce bucket chain contention to a very low level.

Upvotes: 1

keith
keith

Reputation: 5352

Using a data structure is probably too simple if you have a very high number of sessions, you will need a slightly different approach.

Look into storing session data in Redis or another key value store. This would be more normal for servers with high load. Redis and most others offer persistence and don't have locking issues if you need to clear things out in the background.

Upvotes: 2

Dimitri Bosteels
Dimitri Bosteels

Reputation: 381

I don't think a map is really the best collection. With what you said in mind, I would go for a Set (an unordered one if you don't need an order). As you will never have 2 times the same Session they will all be different, and you don't really need an association which a map offers, or I didn't understand your problem correctly.

Upvotes: 1

Related Questions