Javi Torre
Javi Torre

Reputation: 824

Python divide dataframe into chunks

I have a 1 column df with 37365 rows. I would need to separate it in chunks like the below:

df[0:2499]
df[2500:4999]
df[5000:7499]
...
df[32500:34999]
df[35000:37364]

The idea would be to use this in a loop like the below (process_operation does not work for dfs larger than 2500 rows)

while chunk <len(df):
    process_operation(df[lower:upper])

EDIT: I will be having different dataframes as inputs. Some of them will be smaller than 2500. What would be the best approach to also capture these?

Ej: df[0:1234] because 1234<2500

Upvotes: 2

Views: 7807

Answers (3)

Vitaly Mirkis
Vitaly Mirkis

Reputation: 1

I would use

import numpy as np
import math
    
chunk_max_size = 2500
chunks = int(math.ceil(len(df) / chunk_max_size)) 
for df_chunk in np.array_split(df, chunks):
    #where: len(df_chunk) <= 2500

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148870

The range function is enough here:

for start in range(0, len(df), 2500):
    process_operation(df[start:start+2500])

Upvotes: 5

Deniz Polat
Deniz Polat

Reputation: 24

Do you mean something like that?

lower = 0
upper = 2499

while upper <= len(df):
    process_operation(df[lower:upper])
    lower += 2500
    upper += 2500

Upvotes: 0

Related Questions