Groupby into list for non consecutive values

Question

I am trying to group by this dataset

    col1    col2
0   A   1
1   B   1
2   C   1
3   D   3
4   E   3
5   F   2
6   G   2
7   H   1
8   I   1
9   j   2
10  K   2

into this

1 : [A, B, C]
3: [D, E]
2: [ F; G]
1: [ H, I]
2: [ J,K]

so it has to capture the difference in appearances of the elements and not group all at once.

So far I was able to do the normal groupby, df.groupby("col2")["col1"].apply(list) but it isn't correct.

jezrael · Accepted Answer

You need distinguish consecutive values by compare shifted values foe not equal with cumulative sum, last remove second level of MultiIndex:

s = (df.groupby(["col2", df["col2"].ne(df["col2"].shift()).cumsum()])["col1"]
       .agg(list)
       .reset_index(level=1, drop=True))

Answers (2)