Pandas: how to group on column change?

Question

I am working with a log system, and I need to group data not in a standard way. Alas with my limited knowledge of Pandas I couldn't find any example, probably because I don't know proper search terms.

This is a sample dataframe:

df = pd.DataFrame({
    "speed": [2, 4, 6, 8, 8, 9, 2, 3, 8, 9, 13, 18, 25, 27, 18, 8, 6, 8, 12, 20, 27, 34, 36, 41, 44, 54, 61, 60, 61, 40, 17, 12, 15, 24], 
    "class": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 3, 1, 1, 1, 2]
})

df.groupby(by="class").groups returns indexed of each row, all grouped together by class value:

class  indexes
1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18, 30, 32],
2: [12, 13, 19, 20, 21, 22, 33],  
3: [23, 24, 29],
4: [25],
5: [26, 27, 28]

I need instead to split every time column class changes:

    speed  class
0       2      1
1       4      1
2       6      1
3       8      1
4       8      1
5       9      1
6       2      1
7       3      1
8       8      1
9       9      1
10     13      1
11     18      1
12     25      2  <= split here
13     27      2
14     18      1  <= split here
15      8      1
16      6      1
17      8      1
18     12      1  <= split here
19     20      2
20     27      2
21     34      2
22     36      2  <= split here
23     41      3
24     44      3  <= split here
25     54      4  <= split here
26     61      5
27     60      5
28     61      5  <= split here
29     40      3  <= split here
30     17      1  <= split here
31     12      1
32     15      1
33     24      2  <= split here

The desired grouping should return something like:

   class  count   mean
0    1      12    70.50
1    2       2    26.00
2    1       5    10.40
3    2       4    29.25
4    3       2    42.50
5    4       1    54.00
6    5       3    60.66
7    3       1    40.00
8    1       3    14.66
9    2       1    24.00

Is there any command to do it not iteratively?

jezrael · Accepted Answer

Use Series.cumsum with compare if not equal shifted values and aggregate by GroupBy.agg:

g = df["class"].ne(df["class"].shift()).cumsum()
df = (df.groupby(['class', g], sort=False)['speed'].agg(['count','mean'])
        .reset_index(level=1, drop=True)
        .reset_index())
print (df)
   class  count       mean
0      1     12   7.500000
1      2      2  26.000000
2      1      5  10.400000
3      2      4  29.250000
4      3      2  42.500000
5      4      1  54.000000
6      5      3  60.666667
7      3      1  40.000000
8      1      3  14.666667
9      2      1  24.000000

Pandas: how to group on column change?

Answers (2)

Related Questions