James Chen
James Chen

Reputation: 89

loop to convert dataframe to np array chunk by chunk

is it possible to convert a dataframe to numpy array chunk by chunk with a loop. Something like this in pseudo code :

counter = 0
for index, row in dataframe.iterrows():
    if (row['column']) == 1 :
        counter += 1
        if counter == 10:
 take the part of the dataframe where counter is <= 10 
 and convert it to numpy and restart the process at the next row

Upvotes: 0

Views: 571

Answers (1)

Derek Eden
Derek Eden

Reputation: 4628

here's a couple approaches you could take that I started before your most recent comment..by the comment it seems that the first method might be useful for you if you adapt it a little

basically you can loop through the df in chunks and perform the operations you want on each chunk at a time instead of the entire df

import numpy as np
import pandas as pd
data = np.random.rand(1000,3)

df = pd.DataFrame(data)

# LOOPING BY CHUNKS, STORING EACH CHUNK IN A NP ARRAY INSIDE A LIST
ix = 0
chunk = 10
arrays = []
for iy in range(chunk, len(df)+chunk, chunk):
    arrays.append(df.iloc[ix:iy].values)
    ix = iy

# ENTIRE DF TO NP ARRAY
array = df.values

# LOOPING BY CHUNKS, APPENDING EACH CHUNK TO A SINGLE NP ARRAY
ix = 0
chunk = 10
array = np.empty((0,3))
for iy in range(chunk, len(df)+chunk, chunk):
    array = np.concatenate((array, df.iloc[ix:iy].values))
    ix = iy

Upvotes: 1

Related Questions