korbee82
korbee82

Reputation: 117

Group by DataFrame based on consecutive ordered values

I'm trying to group a dataframe based on order of values. Here is my sample code:

import pandas as pd

df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
               {'c1': 'v1', 'c2': 2},
               {'c1': 'v2', 'c2': 3},
               {'c1': 'v1', 'c2': 4},
               {'c1': 'v2', 'c2': 5},
               {'c1': 'v2', 'c2': 6},
               {'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)

here is the result:

         min  max
test c1          
test v1  1.0  4.0
     v2  3.0  6.0
     v3  7.0  7.0

but i'm looking for the possibility to get following result:

         min  max
test c1          
test v1  1.0  2.0
     v2  3.0  3.0
     v1  4.0  4.0
     v2  5.0  6.0
     v3  7.0  7.0

Upvotes: 1

Views: 55

Answers (2)

ipj
ipj

Reputation: 3598

Use:

df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)

result:

         min  max
test c1          
test v1  1.0  2.0
     v1  4.0  4.0
     v2  3.0  3.0
     v2  5.0  6.0
     v3  7.0  7.0

Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.

Upvotes: 2

Henry Yik
Henry Yik

Reputation: 22523

IIUC you need to group by consecutive c1:

df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
         .groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
         .sort_index(level=2))

print(df1)

               min  max
test c1 group          
test v1 1      1.0  2.0
     v2 2      3.0  3.0
     v1 3      4.0  4.0
     v2 4      5.0  6.0
     v3 5      7.0  7.0

Upvotes: 1

Related Questions