streamline processing of large files in pandas

Question

Is there a way to streamline process large or excel files in pandas without taking up large amounts of memory?

What I do right now is load the file like this:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=False)

Perform some task

data.to_csv('Results.csv', sep=',')

If I was working on a computer with low amounts of memory. Is there a way which I can stream and process large datafiles with a iterative function to do something like:

   Load first 1000 rows, store this in memory

   Perform some task

   Save data

   Load next 1000 rows, over write this in memory

   perform task

   append to save file

Tim · Accepted Answer

Just add the chunksize argument to your code:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=Fals, chunksize=10)

result = []
for chunk in data:  # get chunks of 10 rows each
   result.append(chunk.mean())
# do something with res e.g. res = DataFrame(res).to_csv("result.csv")

streamline processing of large files in pandas

Answers (1)

Related Questions