pythlanx
pythlanx

Reputation: 67

Using Python And Pandas To Split Excel Worksheet Into Separate Worksheets

I need a script to split a master worksheet (that contains over 50K rows) into separate worksheets that contain only 40 rows with no headings

After a bit of research I’ve managed to create a script that splits the master worksheet. However, each of the worksheets contains the original heading and the rows are not split into 40 rows per worksheet.

I believe that when you split a worksheet using panda with data frames they will always contain a heading? Any suggestions on how my python script can be modified to achieve what I need or is there a simpler way to achieve this without the need to use pandas and data frames?

Here is a link: https://github.com/lblake/sample-data to the some sample data

path = input('Enter file path to workbook name and extension, 
e.g. example.xlsx: ')
chunksize = int (input('Enter the row number you want to split the excel sheet at: ') )
destination = input('Enter folder path to where you want the split files stored. Press Enter to save in current location: ')

i = 0
df = pd.read_excel(path)
for chunk in np.array_split(df, len(df) // chunksize):
    chunk.to_excel(destination + 
'file_{:02d}.xlsx'.format(i), index=True)
i += 1 

Upvotes: 2

Views: 7029

Answers (2)

jpp
jpp

Reputation: 164623

You can use groupby and iterate. To ignore headers, specify header=False when writing to a pd.ExcelWriter object. The below example splits a dataframe of 10 rows into 2-row chunks.

df = pd.DataFrame(np.arange(100).reshape((10, 10)))

writer = pd.ExcelWriter('file.xlsx')

for key, grp in df.groupby(df.index // 2):
    grp.to_excel(writer, f'sheet_{key}', header=False)

writer.save()

Upvotes: 1

Rodwan Bakkar
Rodwan Bakkar

Reputation: 484

I just copied your code and added header=False

path = input('Enter file path to workbook name and extension, 
e.g. example.xlsx: ')
chunksize = int (input('Enter the row number you want to split the excel sheet at: ') )
destination = input('Enter folder path to where you want the split files stored. Press Enter to save in current location: ')

i = 0
df = pd.read_excel(path)
for chunk in np.array_split(df, len(df) // chunksize):
    chunk.to_excel(destination + 
'file_{:02d}.xlsx'.format(i), index=True, header=False)
i += 1 

and it worked for me.

Upvotes: 1

Related Questions