alessio palmieri
alessio palmieri

Reputation: 77

Use hard drive instead of RAM in Python

I'd like to know if there's a method or a Python Package that can make me use a large dataset without writing it in RAM.

I'm also using pandas for statistical function.

I need to have access on the entire dataset because many statistical functions needs the entire dataset to return credible results.

I'm using PyDev (with interpreter Python 3.4) on LiClipse with Windows 10.

Upvotes: 5

Views: 7953

Answers (2)

SerialDev
SerialDev

Reputation: 2847

You could alternatively use Sframes, Dask for large dataset support or alternatively use pandas and read/iterate in chunks in order to minimise RAM usage. Also worth having a look at the blaze library

Read in chunks:

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)

Upvotes: 3

Emil Vikström
Emil Vikström

Reputation: 91942

If all you need is a virtualization of the disk as a large RAM memory you might set up a swap file on the system. The kernel will then automatically swap pages in and out as needed, using heuristics to figure out what pages should be swapped and which should stay on disk.

Upvotes: 0

Related Questions