Group by DataFrame based on consecutive ordered values

Question

I'm trying to group a dataframe based on order of values. Here is my sample code:

import pandas as pd

df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
               {'c1': 'v1', 'c2': 2},
               {'c1': 'v2', 'c2': 3},
               {'c1': 'v1', 'c2': 4},
               {'c1': 'v2', 'c2': 5},
               {'c1': 'v2', 'c2': 6},
               {'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)

here is the result:

         min  max
test c1          
test v1  1.0  4.0
     v2  3.0  6.0
     v3  7.0  7.0

but i'm looking for the possibility to get following result:

         min  max
test c1          
test v1  1.0  2.0
     v2  3.0  3.0
     v1  4.0  4.0
     v2  5.0  6.0
     v3  7.0  7.0

ipj · Accepted Answer

Use:

df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)

result:

         min  max
test c1          
test v1  1.0  2.0
     v1  4.0  4.0
     v2  3.0  3.0
     v2  5.0  6.0
     v3  7.0  7.0

Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.

Group by DataFrame based on consecutive ordered values

Answers (2)

Related Questions