Reputation: 565
I have a tool where ssers can upload large xlsx files. We need to convert these xlsx files to csv for processing purposes. However, I did not found a fast way to convert a xlsx file to csv. We are not able to use a VBS Script (which was very fast). I tried various ways, like pandas, openpyxl:
pandas
read_file = pd.read_excel(os.path.join(path, old_filename), engine="openpyxl")
read_file.to_csv(os.path.join(path, new_filename), index=None, header=True)
openpyxl
wb = openpyxl.load_workbook(file, data_only=True)
sh = wb.active # was .get_active_sheet()
with open(os.path.join(path, filename), 'w', newline="") as f:
c = csv.writer(f)
for r in sh.iter_rows(): # generator; was sh.rows
c.writerow([cell.value for cell in r])
but a 60mb file of xlsx takes about 4 minutes to convert it to csv.
Is there a way which makes the convertion faster? I am open for any solution.
Upvotes: 0
Views: 1133
Reputation: 1121
Do not iterate over all the cells, inside the rows, copy them whole.
for rownum in range(sh.nrows):
c.writerow(sh.row_values(rownum))
https://stackoverflow.com/a/20105297/13000953
Upvotes: 1