Reputation: 67
I need a script to split a master worksheet (that contains over 50K rows) into separate worksheets that contain only 40 rows with no headings
After a bit of research I’ve managed to create a script that splits the master worksheet. However, each of the worksheets contains the original heading and the rows are not split into 40 rows per worksheet.
I believe that when you split a worksheet using panda with data frames they will always contain a heading? Any suggestions on how my python script can be modified to achieve what I need or is there a simpler way to achieve this without the need to use pandas and data frames?
Here is a link: https://github.com/lblake/sample-data to the some sample data
path = input('Enter file path to workbook name and extension,
e.g. example.xlsx: ')
chunksize = int (input('Enter the row number you want to split the excel sheet at: ') )
destination = input('Enter folder path to where you want the split files stored. Press Enter to save in current location: ')
i = 0
df = pd.read_excel(path)
for chunk in np.array_split(df, len(df) // chunksize):
chunk.to_excel(destination +
'file_{:02d}.xlsx'.format(i), index=True)
i += 1
Upvotes: 2
Views: 7029
Reputation: 164623
You can use groupby
and iterate. To ignore headers, specify header=False
when writing to a pd.ExcelWriter
object. The below example splits a dataframe of 10 rows into 2-row chunks.
df = pd.DataFrame(np.arange(100).reshape((10, 10)))
writer = pd.ExcelWriter('file.xlsx')
for key, grp in df.groupby(df.index // 2):
grp.to_excel(writer, f'sheet_{key}', header=False)
writer.save()
Upvotes: 1
Reputation: 484
I just copied your code and added header=False
path = input('Enter file path to workbook name and extension,
e.g. example.xlsx: ')
chunksize = int (input('Enter the row number you want to split the excel sheet at: ') )
destination = input('Enter folder path to where you want the split files stored. Press Enter to save in current location: ')
i = 0
df = pd.read_excel(path)
for chunk in np.array_split(df, len(df) // chunksize):
chunk.to_excel(destination +
'file_{:02d}.xlsx'.format(i), index=True, header=False)
i += 1
and it worked for me.
Upvotes: 1