Kallol
Kallol

Reputation: 2189

How to group indices of continuous occurrence of pandas columns values

I have a data frame like this,

df
col1     col2
  1        A
  2        A
  3        A
  4        B
  5        B
  6        A
  7        C
  8        C
  9        C
  10       A
  11       A
  12       A
  13       B
  14       B
  15       A

I want to find the indices of col2 values other than A and group it together and put it in a list.

So the list will be having the indices of continuous occurred values other than A

The list will look like,

[[3,4],[6,7,8],[12,13]]

I could do it using a for loop with a check of previous and next values. But the execution time will be huge. I am looking for pandas shortcut/pythonic way to do it most efficiently.

Upvotes: 3

Views: 322

Answers (2)

Dev Khadka
Dev Khadka

Reputation: 5471

you can also do it like this

to get index column you may need to call reset_index


df = pd.DataFrame([['1', 'A'], ['2', 'A'], ['3', 'A'], ['4', 'B'], ['5', 'B'], ['6', 'A'], ['7', 'C'], ['8', 'C'], ['9', 'C'], ['10', 'A'], ['11', 'A'], ['12', 'A'], ['13', 'B'], ['14', 'B'], ['15', 'A']], columns=('col1', 'col2'))

df2 = (df["col2"] == 'A').cumsum()[df["col2"] != 'A']
df3 = df.groupby(df2).apply(lambda g: list(g.index))
df3.to_list()

Upvotes: 3

jezrael
jezrael

Reputation: 863246

Idea is first Series.shift values for previous A, then create consecutive groups by Series.ne, shift with Series.cumsum and pass to DataFrame.groupby with list:

s = df['col2'].shift(-1).ffill()
g = s.ne(s.shift()).cumsum()[s != 'A']
out = df['col1'].groupby(g).apply(list).tolist()
print (out)
[[3, 4], [6, 7, 8], [12, 13]]

Upvotes: 3

Related Questions