Reputation: 858
I want to select data from the pandas dataframe based on the repeated range of values in a particular column. For example, I've df below.
range_col col1 col2
1 a b
2 c d
3 e f
1 m n
2 o p
3 r s
1 e f
2 j k
3 l m
In above df, I would like to iterate thru chunk of data by selecting ranges 1 2 3
from range_col
. So first chunk will be
range_col col1 col2
1 a b
2 c d
3 e f
Second will be
range_col col1 col2
1 m n
2 o p
3 r s
etc...
How can I iterate on df like this?
Upvotes: 0
Views: 555
Reputation: 30930
Use Series.cumsum
+ Series.min
(see Detail of groups
) to create groups using DataFrame.groupby
:
groups=(df['range_col']==df['range_col'].min()).cumsum()
for i,group in df.groupby(groups):
print(group)
print('-'*20)
Output:
range_col col1 col2
0 1 a b
1 2 c d
2 3 e f
--------------------
range_col col1 col2
3 1 m n
4 2 o p
5 3 r s
--------------------
range_col col1 col2
6 1 e f
7 2 j k
8 3 l m
--------------------
Detail:
print(groups)
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
Name: range_col, dtype: int64
You could save the dataframes in a dict:
groups=(df['range_col']==df['range_col'].min()).cumsum()
dfn={i:group for i,group in df.groupby(groups)}
for n in dfn:
print(f'df[{n}]')
print(dfn[n])
print('-'*20)
df[1]
range_col col1 col2
0 1 a b
1 2 c d
2 3 e f
--------------------
df[2]
range_col col1 col2
3 1 m n
4 2 o p
5 3 r s
--------------------
df[3]
range_col col1 col2
6 1 e f
7 2 j k
8 3 l m
--------------------
Upvotes: 1
Reputation: 61920
You could do:
for _, group in df.groupby(df['range_col'] - df.index, sort=False):
print(group)
print()
Output
range_col col1 col2
0 1 a b
1 2 c d
2 3 e f
range_col col1 col2
3 1 m n
4 2 o p
5 3 r s
range_col col1 col2
6 1 e f
7 2 j k
8 3 l m
This is based on an old recipe from itertools, see here.
Upvotes: 1