muckabout
muckabout

Reputation: 1941

How does OS handle a python dict that's larger than memory?

I have a python program that is going to eat a lot of memory, primarily in a dict. This dict will be responsible for assigning a unique integer value to a very large set of keys. As I am working with large matrices, I need a key-to-index correspondence that can also be recovered from (i.e., once matrix computations are complete, I need to map the values back to the original keys).

I believe this amount will eventually surpass available memory. I am wondering how this will be handled with regards to swap space. Perhaps there is a better data structure for this purpose.

Upvotes: 4

Views: 1484

Answers (4)

zdav
zdav

Reputation: 2748

As far as I can remember, when a dict is expanded it just relies on C's malloc. The program will keep running as long as malloc keeps succeeding. Most OS's will keep malloc working as long as there is enough memory, and then as long as there are pages that can be swapped in. In either case Python will throw a MemoryError exception when malloc fails, as per the documentation. As far as the data structure goes, dict is going to be very efficient space-wise. The only way to really do better is to use an analytical function to map the values back and forth.

Upvotes: 1

Antti Huima
Antti Huima

Reputation: 25522

It will just end up in swap trashing, because a hash table has very much randomized memory access patterns.

If you know that the map exceeds the size of the physical memory, you could consider using a data structure on the disk in the first place. This especially if you don't need the data structure during the computation. When the hash table triggers swapping, it creates problems also outside the hash table itself.

Upvotes: 1

extraneon
extraneon

Reputation: 23950

Swap space is a kernel feature and transparant to the user (python).

If you do have a huge dict and don't need all the data at once, you could look at redis which might do what you want. Or maybe not :)

Upvotes: 2

Andrew McGregor
Andrew McGregor

Reputation: 34602

You need a database, if the data will exceed memory. The indexing of dictionaries isn't designed for good performance when a dictionary is bigger than memory.

Upvotes: 5

Related Questions