inaMinute
inaMinute

Reputation: 585

write dataframe to .xlsx too slow

I have a 40MB dataframe 'dfScore' I am writing to .xlsx。 enter image description here the code is as follow,

writer = pandas.ExcelWriter('test.xlsx', engine='xlsxwriter')
dfScore.to_excel(writer,sheet_name='Sheet1')
writer.save()

the code dfScore.to_excel take almost an hour ,the code writer.save() takes another hour. Is this normal? Is there a good way to take less than 10 min?

i already searched in stackoverflow ,but it seems some suggestions not working on my problem.

Upvotes: 12

Views: 22730

Answers (2)

payam.1991
payam.1991

Reputation: 91

Why don't you save it as .csv? I have worked with heavier DataFrames on my personal laptop and I had the same problem with writing to xlsx.

your_dataframe.to_csv('my_file.csv',encoding='utf-8',columns=list_of_dataframe_columns)

then you can simply convert it to .xlsx with MS Excel or an online convertor.

Upvotes: 8

jmcnamara
jmcnamara

Reputation: 41554

the code dfScore.to_excel take almost an hour ,the code writer.save() takes another hour. Is this normal?

That sounds a bit too high. I ran an XlsxWriter test writing 1,000,000 rows x 5 columns and it took ~ 100s. The time will vary based on the CPU and Memory of the test machine but 1 hour is 36 times slower which doesn't seem right.

Note, Excel, and thus XlsxWriter, only supports 1,048,576 rows per worksheet so you are effectively throwing away 3/4s of your data and wasting time doing it.

Is there a good way to take less than 10 min?

For pure XlsxWriter programs pypy gives a good speed up. For example rerunning my 1,000,000 rows x 5 columns testcase with pypy the time went from 99.15s to 16.49s. I don't know if Pandas works with pypy though.

Upvotes: -1

Related Questions