Reputation: 10021
Let's say I have an excel file with 101
rows, I need to split and write into 11
excel files with equivalent row number 10
for each new file, except the last one since there is only one row left.
This is code I have tried, but I get KeyError: 11
:
df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(int(len(df.index)/10) + 1)
for i, g in groups:
g.to_excel("%s.xlsx" % i, index = False, index_lable = False)
Someone could help with this issue? Thanks a lot.
Reference related: Split pandas dataframe into multiple dataframes with equal numbers of rows
Upvotes: 2
Views: 1845
Reputation: 128
I solved a similar problem as follows. Backstory to my issue was that I have created an Azure Function with an HTTP trigger, but was overwhelming the endpoint when iterating through 2k rows of requests. So chunked up the origin file into rows of 50:
import pandas as pd
import logging
INXL = pd.read_excel('split/031022.xlsx', engine="openpyxl")
row_count = (len(INXL.index))
#make sure we are dealing with a table bigger than 50
if row_count >= 51:
row_start = (row_count -50)
else:
row_start = 1
def extract(rs, rc):
while rc >= 51: #loop body
# set the extraction to be between the row start and ending index
row_extract = INXL.iloc[rs:rc]
with pd.ExcelWriter(f'output_{rc}.xlsx') as writer:
row_extract.to_excel(writer,index=False)
rc -= 50
rs -= 50
extract(row_start, row_count)
if row_count < 51:
row_extract = INXL.iloc[row_start:row_count]
with pd.ExcelWriter(f'output_{row_count}.xlsx') as writer:
row_extract.to_excel(writer,index=False)
logging.info("extract completed")
Upvotes: 1
Reputation: 862671
I think you need np.arange
:
df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))//10)
for i, g in groups:
print(g)
Upvotes: 2