Reputation: 117
I'm trying to group a dataframe based on order of values. Here is my sample code:
import pandas as pd
df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
{'c1': 'v1', 'c2': 2},
{'c1': 'v2', 'c2': 3},
{'c1': 'v1', 'c2': 4},
{'c1': 'v2', 'c2': 5},
{'c1': 'v2', 'c2': 6},
{'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)
here is the result:
min max
test c1
test v1 1.0 4.0
v2 3.0 6.0
v3 7.0 7.0
but i'm looking for the possibility to get following result:
min max
test c1
test v1 1.0 2.0
v2 3.0 3.0
v1 4.0 4.0
v2 5.0 6.0
v3 7.0 7.0
Upvotes: 1
Views: 55
Reputation: 3598
Use:
df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)
result:
min max
test c1
test v1 1.0 2.0
v1 4.0 4.0
v2 3.0 3.0
v2 5.0 6.0
v3 7.0 7.0
Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.
Upvotes: 2
Reputation: 22523
IIUC you need to group by consecutive c1
:
df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
.groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
.sort_index(level=2))
print(df1)
min max
test c1 group
test v1 1 1.0 2.0
v2 2 3.0 3.0
v1 3 4.0 4.0
v2 4 5.0 6.0
v3 5 7.0 7.0
Upvotes: 1