Reputation: 756
I need to select each time N rows in a pandas Dataframe using iterrows. Something like this:
def func():
selected = []
for i in range(N):
selected.append(next(dataframe.iterrows()))
yield selected
But doing this selected
has N equal elements. And each time I call func
I have always the same result (the first element of the dataframe).
If the dataframe is:
A B C
0 5 8 2
1 1 2 3
2 4 5 6
3 7 8 9
4 0 1 2
5 3 4 5
6 7 8 6
7 1 2 3
What I want to obtain is:
N = 3
selected = [ [5,8,2], [1,2,3], [4,5,6] ]
then, calling again the function,
selected = [ [7,8,9], [0,1,2], [3,4,5] ]
then,
selected = [ [7,8,6], [1,2,3], [5,8,2] ]
Upvotes: 3
Views: 8270
Reputation: 45752
No need for .iterrows()
, rather use slicing:
def flow_from_df(dataframe: pd.DataFrame, chunk_size: int = 10):
for start_row in range(0, dataframe.shape[0], chunk_size):
end_row = min(start_row + chunk_size, dataframe.shape[0])
yield dataframe.iloc[start_row:end_row, :]
To use it:
get_chunk = flow_from_df(dataframe)
chunk1 = next(get_chunk)
chunk2 = next(get_chunk)
Or not using a generator:
def get_chunk(dataframe: pd.DataFrame, chunk_size: int, start_row: int = 0) -> pd.DataFrame:
end_row = min(start_row + chunk_size, dataframe.shape[0])
return dataframe.iloc[start_row:end_row, :]
Upvotes: 7
Reputation: 756
I think I found an answer, doing this
def func(rowws = df.iterrows(), N=3):
selected = []
for i in range(N):
selected.append(next(rowws))
yield selected
selected = next(func())
Upvotes: 1
Reputation: 2331
return should be used instead of yield. if you want plain data in selected as list of list you can do this:
def func():
selected = []
for index, row in df.iterrows():
if(index<N):
rowData =[]
rowData.append(row['A'])
rowData.append(row['B'])
rowData.append(row['C'])
selected.append(rowData)
else:
break
return selected
Upvotes: 1
Reputation: 605
I am assuming you are calling the function in a loop. You can try this.
def select_in_df(start, end):
selected = data_frame[start:end]
selected = select.values.tolist()
return selected
print(select_in_df(0, 4)) #to update the start and end values, you can use any loop or whatever is your convenience
#here is an example
start = 0
end = 3
for i in range(10): #instead of range you can use data_frame.iterrows()
select_in_df(start, end+1) #0:4 which gives you 3 rows
start = end+1
end = i
Upvotes: 1
Reputation: 71570
Try using:
def func(dataframe, N=3):
return np.array_split(dataframe.values, N)
print(func(dataframe))
Output:
[array([[5, 8, 2],
[1, 2, 3],
[4, 5, 6]]), array([[7, 8, 9],
[0, 1, 2],
[3, 4, 5]]), array([[7, 8, 6],
[1, 2, 3]])]
Upvotes: 0