solopiu
solopiu

Reputation: 756

Select next N rows in pandas dataframe using iterrows

I need to select each time N rows in a pandas Dataframe using iterrows. Something like this:

def func():
    selected = []
    for i in range(N):
        selected.append(next(dataframe.iterrows()))

    yield selected

But doing this selected has N equal elements. And each time I call func I have always the same result (the first element of the dataframe).

If the dataframe is:

   A  B  C
0  5  8  2
1  1  2  3
2  4  5  6
3  7  8  9
4  0  1  2
5  3  4  5
6  7  8  6
7  1  2  3

What I want to obtain is:

N = 3
selected = [ [5,8,2], [1,2,3], [4,5,6] ] 
then, calling again the function,
selected = [ [7,8,9], [0,1,2], [3,4,5] ] 
then,
selected = [ [7,8,6], [1,2,3], [5,8,2] ] 

Upvotes: 3

Views: 8270

Answers (5)

Dan
Dan

Reputation: 45752

No need for .iterrows(), rather use slicing:

def flow_from_df(dataframe: pd.DataFrame, chunk_size: int = 10):
    for start_row in range(0, dataframe.shape[0], chunk_size):
        end_row  = min(start_row + chunk_size, dataframe.shape[0])
        yield dataframe.iloc[start_row:end_row, :]

To use it:

get_chunk = flow_from_df(dataframe)
chunk1 = next(get_chunk)
chunk2 = next(get_chunk)

Or not using a generator:

def get_chunk(dataframe: pd.DataFrame, chunk_size: int, start_row: int = 0) -> pd.DataFrame:
    end_row  = min(start_row + chunk_size, dataframe.shape[0])

    return dataframe.iloc[start_row:end_row, :]

Upvotes: 7

solopiu
solopiu

Reputation: 756

I think I found an answer, doing this

def func(rowws = df.iterrows(), N=3):
    selected = []
    for i in range(N):
        selected.append(next(rowws))

    yield selected

selected = next(func())

Upvotes: 1

SM Abu Taher Asif
SM Abu Taher Asif

Reputation: 2331

return should be used instead of yield. if you want plain data in selected as list of list you can do this:

 def func():
    selected = []
    for index, row in df.iterrows():
        if(index<N):
            rowData =[]
            rowData.append(row['A'])
            rowData.append(row['B'])
            rowData.append(row['C'])
            selected.append(rowData)
        else:
            break

    return selected

Upvotes: 1

Gravity Mass
Gravity Mass

Reputation: 605

I am assuming you are calling the function in a loop. You can try this.

def select_in_df(start, end):
    selected = data_frame[start:end]
    selected = select.values.tolist()
    return selected


print(select_in_df(0, 4)) #to update the start and end values, you can use any loop or whatever is your convenience 

#here is an example 
start = 0
end = 3
for i in range(10): #instead of range you can use data_frame.iterrows() 
    select_in_df(start, end+1) #0:4 which gives you 3 rows
    start = end+1
    end = i

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71570

Try using:

def func(dataframe, N=3):
    return np.array_split(dataframe.values, N)

print(func(dataframe))

Output:

[array([[5, 8, 2],
       [1, 2, 3],
       [4, 5, 6]]), array([[7, 8, 9],
       [0, 1, 2],
       [3, 4, 5]]), array([[7, 8, 6],
       [1, 2, 3]])]

Upvotes: 0

Related Questions