Reputation: 585
I have a 40MB dataframe 'dfScore' I am writing to .xlsx。 the code is as follow,
writer = pandas.ExcelWriter('test.xlsx', engine='xlsxwriter')
dfScore.to_excel(writer,sheet_name='Sheet1')
writer.save()
the code dfScore.to_excel
take almost an hour ,the code writer.save()
takes another hour. Is this normal? Is there a good way to take less than 10 min?
i already searched in stackoverflow ,but it seems some suggestions not working on my problem.
Upvotes: 12
Views: 22730
Reputation: 91
Why don't you save it as .csv? I have worked with heavier DataFrames on my personal laptop and I had the same problem with writing to xlsx.
your_dataframe.to_csv('my_file.csv',encoding='utf-8',columns=list_of_dataframe_columns)
then you can simply convert it to .xlsx with MS Excel or an online convertor.
Upvotes: 8
Reputation: 41554
the code dfScore.to_excel take almost an hour ,the code writer.save() takes another hour. Is this normal?
That sounds a bit too high. I ran an XlsxWriter test writing 1,000,000 rows x 5 columns and it took ~ 100s. The time will vary based on the CPU and Memory of the test machine but 1 hour is 36 times slower which doesn't seem right.
Note, Excel, and thus XlsxWriter, only supports 1,048,576 rows per worksheet so you are effectively throwing away 3/4s of your data and wasting time doing it.
Is there a good way to take less than 10 min?
For pure XlsxWriter programs pypy gives a good speed up. For example rerunning my 1,000,000 rows x 5 columns testcase with pypy the time went from 99.15s to 16.49s. I don't know if Pandas works with pypy though.
Upvotes: -1