How to group indices of continuous occurrence of pandas columns values

Question

I have a data frame like this,

df
col1     col2
  1        A
  2        A
  3        A
  4        B
  5        B
  6        A
  7        C
  8        C
  9        C
  10       A
  11       A
  12       A
  13       B
  14       B
  15       A

I want to find the indices of col2 values other than A and group it together and put it in a list.

So the list will be having the indices of continuous occurred values other than A

The list will look like,

[[3,4],[6,7,8],[12,13]]

I could do it using a for loop with a check of previous and next values. But the execution time will be huge. I am looking for pandas shortcut/pythonic way to do it most efficiently.

jezrael · Accepted Answer

Idea is first Series.shift values for previous A, then create consecutive groups by Series.ne, shift with Series.cumsum and pass to DataFrame.groupby with list:

s = df['col2'].shift(-1).ffill()
g = s.ne(s.shift()).cumsum()[s != 'A']
out = df['col1'].groupby(g).apply(list).tolist()
print (out)
[[3, 4], [6, 7, 8], [12, 13]]

How to group indices of continuous occurrence of pandas columns values

Answers (2)

Related Questions