smm
smm

Reputation: 858

Take chunks of data from pandas df based on repeated values ranges in a column

I want to select data from the pandas dataframe based on the repeated range of values in a particular column. For example, I've df below.

range_col col1 col2
  1        a   b
  2        c   d
  3        e   f
  1        m   n
  2        o   p
  3        r   s
  1        e   f
  2        j   k
  3        l   m 

In above df, I would like to iterate thru chunk of data by selecting ranges 1 2 3 from range_col. So first chunk will be

range_col col1 col2
  1        a   b
  2        c   d
  3        e   f

Second will be

range_col col1 col2
  1        m   n
  2        o   p
  3        r   s

etc...

How can I iterate on df like this?

Upvotes: 0

Views: 555

Answers (2)

ansev
ansev

Reputation: 30930

Use Series.cumsum + Series.min (see Detail of groups) to create groups using DataFrame.groupby:

groups=(df['range_col']==df['range_col'].min()).cumsum()
for i,group in df.groupby(groups):
    print(group)
    print('-'*20)

Output:

       range_col col1 col2
0          1    a    b
1          2    c    d
2          3    e    f
--------------------
   range_col col1 col2
3          1    m    n
4          2    o    p
5          3    r    s
--------------------
   range_col col1 col2
6          1    e    f
7          2    j    k
8          3    l    m
--------------------

Detail:

print(groups)
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: range_col, dtype: int64

You could save the dataframes in a dict:

groups=(df['range_col']==df['range_col'].min()).cumsum()
dfn={i:group for i,group in df.groupby(groups)}

for n in dfn:
    print(f'df[{n}]')
    print(dfn[n])
    print('-'*20)

df[1]
   range_col col1 col2
0          1    a    b
1          2    c    d
2          3    e    f
--------------------
df[2]
   range_col col1 col2
3          1    m    n
4          2    o    p
5          3    r    s
--------------------
df[3]
   range_col col1 col2
6          1    e    f
7          2    j    k
8          3    l    m
--------------------

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61920

You could do:

for _, group in df.groupby(df['range_col'] - df.index, sort=False):
    print(group)
    print()

Output

    range_col col1 col2
0          1    a    b
1          2    c    d
2          3    e    f

   range_col col1 col2
3          1    m    n
4          2    o    p
5          3    r    s

   range_col col1 col2
6          1    e    f
7          2    j    k
8          3    l    m

This is based on an old recipe from itertools, see here.

Upvotes: 1

Related Questions