Reputation: 858
I have a large file I need to load to a dataframe. I will need to work on it for a while. Is there a way of keeping in loaded in memory, so that if my script fails, I will not need to load it again ?
Upvotes: 1
Views: 3529
Reputation: 42905
Here's an example of how one can keep variables in memory between runs.
For persistent storage beyond RAM, I would recommend looking into HDF5
. It's fast, simple, and allows for queries if necessary: (see docs).
It supports .read_hdf()
and .to_hdf()
similar to the _csv()
methods, but is significantly faster.
A simple illustration of storage and retrieval including query (from the docs) would be:
df = DataFrame(dict(A=list(range(5)), B=list(range(5))))
df.to_hdf('store_tl.h5','table', append=True)
read_hdf('store_tl.h5', 'table', where = ['index>2'])
Upvotes: 1