Reputation: 8659
Is there a way to streamline process large or excel files in pandas without taking up large amounts of memory?
What I do right now is load the file like this:
data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=False)
Perform some task
data.to_csv('Results.csv', sep=',')
If I was working on a computer with low amounts of memory. Is there a way which I can stream and process large datafiles with a iterative function to do something like:
Load first 1000 rows, store this in memory
Perform some task
Save data
Load next 1000 rows, over write this in memory
perform task
append to save file
Upvotes: 1
Views: 652
Reputation: 2202
Just add the chunksize argument to your code:
data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=Fals, chunksize=10)
result = []
for chunk in data: # get chunks of 10 rows each
result.append(chunk.mean())
# do something with res e.g. res = DataFrame(res).to_csv("result.csv")
Upvotes: 1