BenMQ
BenMQ

Reputation: 876

Python - storing integer array to disk for efficient retrieval

I have a large integer array that I need to store in a file, what is the most efficient way so I can have quick retrieval speed? I'm not concerned with efficiency of writing to disk, but reading only I am wondering if there is a good solution other than json and pickle?

Upvotes: 0

Views: 410

Answers (2)

Wojciech Walczak
Wojciech Walczak

Reputation: 3599

msgpack will probably beat json in terms of performance in loading data. Or, at least, msgpack beats json in my tests in loading many large files. Yet another possibility is to try HDF5 for Python:

HDF5 is an open-source library and file format for storing large amounts of numerical data, originally developed at NCSA. It is widely used in the scientific community for everything from NASA’s Earth Observing System to the storage of data from laboratory experiments and simulations. Over the past few years, HDF5 has rapidly emerged as the de-facto standard technology in Python to store large numerical datasets.

In your case I would go for HDF5.

Upvotes: 1

Clarus
Clarus

Reputation: 2338

JSON/pickle are very low efficiency solutions as they require at best several memory copies to get your data in or out.

Keep you data binary if you want the best efficiency. The pure python approach would involve using struct.unpack, however this is a little kludgy as you still need a memory copy.

Even better is something like numpy.memmap which directly maps your file to a numpy array. Very fast, very memory efficient. Problem solved. You can also write your file using the same approach.

Upvotes: 1

Related Questions