Reputation: 114
I have a dataFrame with around 28 millions rows (5 columns) and I'm struggling to write that to an excel, which is limited to 1,048,576 rows, I can't have that in more than one workbook so I'll need to split thoes 28Mi into 28 sheets and so on.
this is what I'm doing with it:
writer = pd.ExcelWriter('NAME_XX-' + mes +'-' + ano + '_XXX.xlsx', engine = "xlsxwriter")
notMor.to_excel(writer, engine='xlsxwriter', index=False)
I tought of splitting the dataframe into 27 different ones and then saving each one on the workbook but there isn't a simpler way?
Upvotes: 0
Views: 14332
Reputation: 180
I had the same problem and I had implemented the following. It's my pleasure if this helps you at all.
rows_per_sheet = 100000
number_of_sheets = floor((len(data)/rows_per_sheet))+1
start_index=0
end_index = rows_per_sheet
writer = pd.ExcelWriter(filename)
for i in range(number_of_sheets):
df = pd.DataFrame(list(data[start_index:end_index]), columns=columns)
df.to_excel(writer, index=False, sheet_name='sheet_'+str(i))
start_index = end_index
end_index = end_index + rows_per_sheet
writer.save()
However, to create multiple sheets on a same file would cost you a lot of time as every time it needs to load the original file which gets incremented every time. In this case you can create multiple excel file instead of multiple sheets-
rows_per_file = 1000000
number_of_files = floor((len(data)/rows_per_file))+1
start_index=0
end_index = rows_per_file
df = pd.DataFrame(list(data), columns=columns)
for i in range(number_of_files):
filepart = 'file' + '_'+ str(i) + '.xlsx'
writer = pd.ExcelWriter(filepart)
df_mod = df.iloc[start_index:end_index]
df_mod.to_excel(writer, index=False, sheet_name='sheet')
start_index = end_index
end_index = end_index + rows_per_file
writer.save()
Upvotes: 0
Reputation: 62493
csv
, and use the Excel Data Model, which has a row limit of 1,999,999,997
, to import the file.
Upvotes: 5
Reputation: 13
simply / easy / fast --> download your data to a CSV split by 1000000 from Pandas and then have Excel open it. Excel will adjust and open across multiple sheets accordingly.
Some code you can tweak:
#split file code
lines_per_file = 1000000
smallfile = None
with open('Plan.txt') as bigfile:
for lineno, line in enumerate(bigfile):
if lineno % lines_per_file == 0:
if smallfile:
smallfile.close()
small_filename = 'small_file_{}.txt'.format(lineno + lines_per_file)
smallfile = open(small_filename, "w")
smallfile.write(line)
if smallfile:
smallfile.close()
Upvotes: -1