Reputation: 1391
I am trying to convert a large ~ 100mb csv file to xlsx using python. I tried the solution below, but ran into memory issues
https://stackoverflow.com/a/62750402/1266723
Solution below solves the memory issue by writing in chunks, but over-writes the files. Can someone please recommend how to append to the same file instead of creating new files or overwriting the same file?
https://stackoverflow.com/a/71237366/1266723
import pandas as pd
n = 1000 # number of rows per chunk
df = pd.read_csv("myFile.csv")
for i in range(0, df.shape[0], n):
df[i:i+n].to_excel(f"new_file.xlsx", index=False, header=False)
Upvotes: 0
Views: 807
Reputation: 149095
You can use an ExcelWriter to do multiple writes to an xlsx file.
You could even read the csv file in chunks:
n = 1000 # chunksize
with pd.read_csv("myFile.csv", chunksize=n) as reader, \
pd.ExcelWriter('new_file.xlsx') as writer:
for i, df in enumerate(reader):
df.to_excel(writer, index=False, header=False, startrow=i * n)
The above works in recent pandas version, but read_csv
only returns a context manager since version 1.2 so if you are using a too old version, you should use a try: ...: finaly: ...
bloc instead:
n = 1000 # chunksize
writer = pd.ExcelWriter('new_file.xlsx')
try:
with pd.read_csv("myFile.csv", chunksize=n) as reader:
for i, df in enumerate(reader):
df.to_excel(writer, index=False, header=False, startrow=i * n)
finally:
writer.close()
Upvotes: 1