Reputation: 17415
What are the top factors to consider when tuning inserts for a LevelDB store?
I'm inserting 500M+ records in the form:
into a LevelDB store using the python plyvel, and see dramatic drop in speed as the number of records grows. I guess this is expected but are there tuning measures I could look at to make it scale better?
Example code:
import plyvel
BATCHSIZE = 1000000
db = plyvel.DB('/tmp/lvldbSNP151/', create_if_missing=True)
wb = db.write_batch()
# items not in any key order
for key, value in DBSNPfile:
wb.put(key,value)
if i%BATCHSIZE==0:
wb.write()
wb.write()
I've tried various batch sizes, which helps bit, but am hoping there's something else I've missed. For example, can knowing the max length of a key (or value) be leveraged?
Upvotes: 4
Views: 2182
Reputation: 4037
(Plyvel author here.)
LevelDB keeps all database items in sorted order. Since you are writing in a random order, this basically means that all parts of the database get rewritten all the time since LevelDB has to merge SSTs (this happens in the background). Once your database gets larger, and you keep adding more items to it, this results in a reduced write throughput.
I suspect that performance will not degrade as badly if you have better locality of your writes.
Other ideas that may be worth trying out are:
write_buffer_size
max_file_size
block_size
.write_batch(sync=False)
The above can all be used from Python using extra keyword arguments to plyvel.DB
and to the .write_batch()
method. See the api docs for details.
Upvotes: 5