Reputation: 89
is it possible to convert a dataframe to numpy array chunk by chunk with a loop. Something like this in pseudo code :
counter = 0
for index, row in dataframe.iterrows():
if (row['column']) == 1 :
counter += 1
if counter == 10:
take the part of the dataframe where counter is <= 10
and convert it to numpy and restart the process at the next row
Upvotes: 0
Views: 571
Reputation: 4628
here's a couple approaches you could take that I started before your most recent comment..by the comment it seems that the first method might be useful for you if you adapt it a little
basically you can loop through the df in chunks and perform the operations you want on each chunk at a time instead of the entire df
import numpy as np
import pandas as pd
data = np.random.rand(1000,3)
df = pd.DataFrame(data)
# LOOPING BY CHUNKS, STORING EACH CHUNK IN A NP ARRAY INSIDE A LIST
ix = 0
chunk = 10
arrays = []
for iy in range(chunk, len(df)+chunk, chunk):
arrays.append(df.iloc[ix:iy].values)
ix = iy
# ENTIRE DF TO NP ARRAY
array = df.values
# LOOPING BY CHUNKS, APPENDING EACH CHUNK TO A SINGLE NP ARRAY
ix = 0
chunk = 10
array = np.empty((0,3))
for iy in range(chunk, len(df)+chunk, chunk):
array = np.concatenate((array, df.iloc[ix:iy].values))
ix = iy
Upvotes: 1