Reputation: 73
I have a rather large dictionary with about 40 million keys which I naively stored just by writing {key: value, key: value, ...}
into a text file. I didn't consider the fact that I could never realistically access this data because python has an aversion to loading and evaluating a 1.44GB text file as a dictionary.
I know I could use something like shelve
to be able to access the data without reading all of it at once, but I'm not sure how I would even convert this text file to a shelve file without regenerating all the data (which I would prefer not to do). Are there any better alternatives for storing, accessing, and potentially later changing this much data? If not, how should I go about converting this monstrosity over to a format usable by shelve?
If it matters, the dictionary is of the form {(int, int, int int): [[int, int], Bool]}
Upvotes: 3
Views: 2297
Reputation: 5329
https://github.com/dagnelies/pysos
https://github.com/dagnelies/pysos
It works like a normal python dict
, but has the advantage that it's much more efficient than shelve
on windows and is also cross-platform, unlike shelve
where the data storage differs based on the OS.
To install:
pip install pysos
Usage:
import pysos
db = pysos.Dict('somefile')
db['hello'] = 'persistence!'
Just to give a ballpark figure, here is a mini benchmark (on my windows laptop):
import pysos
t = time.time()
import time
N = 100 * 1000
db = pysos.Dict("test.db")
for i in range(N):
db["key_" + str(i)] = {"some": "object_" + str(i)}
db.close()
print('PYSOS time:', time.time() - t)
# => PYSOS time: 3.424309253692627
The resulting file was about 3.5 Mb big.
So, in your case, if a million key/value pairs take roughly 1 minute to insert ...it would take you almost an hour to insert it all. Of course, the machine's specs can influence that a lot. It's just a very rough estimate.
Upvotes: 1
Reputation: 10094
Redis is a in-memory key-value store that can be used for this kind of problems.
There are several Python clients.
hmset operation allows you to insert multiple key-values.
Upvotes: 1