Reputation: 824
I have a 1 column df with 37365 rows. I would need to separate it in chunks like the below:
df[0:2499]
df[2500:4999]
df[5000:7499]
...
df[32500:34999]
df[35000:37364]
The idea would be to use this in a loop like the below (process_operation does not work for dfs larger than 2500 rows)
while chunk <len(df):
process_operation(df[lower:upper])
EDIT: I will be having different dataframes as inputs. Some of them will be smaller than 2500. What would be the best approach to also capture these?
Ej: df[0:1234] because 1234<2500
Upvotes: 2
Views: 7807
Reputation: 1
I would use
import numpy as np
import math
chunk_max_size = 2500
chunks = int(math.ceil(len(df) / chunk_max_size))
for df_chunk in np.array_split(df, chunks):
#where: len(df_chunk) <= 2500
Upvotes: 0
Reputation: 148870
The range
function is enough here:
for start in range(0, len(df), 2500):
process_operation(df[start:start+2500])
Upvotes: 5
Reputation: 24
Do you mean something like that?
lower = 0
upper = 2499
while upper <= len(df):
process_operation(df[lower:upper])
lower += 2500
upper += 2500
Upvotes: 0