Reputation: 2189
I have a data frame like this,
df
col1 col2
1 A
2 A
3 A
4 B
5 B
6 A
7 C
8 C
9 C
10 A
11 A
12 A
13 B
14 B
15 A
I want to find the indices of col2 values other than A and group it together and put it in a list.
So the list will be having the indices of continuous occurred values other than A
The list will look like,
[[3,4],[6,7,8],[12,13]]
I could do it using a for loop with a check of previous and next values. But the execution time will be huge. I am looking for pandas shortcut/pythonic way to do it most efficiently.
Upvotes: 3
Views: 322
Reputation: 5471
you can also do it like this
to get index column you may need to call reset_index
df = pd.DataFrame([['1', 'A'], ['2', 'A'], ['3', 'A'], ['4', 'B'], ['5', 'B'], ['6', 'A'], ['7', 'C'], ['8', 'C'], ['9', 'C'], ['10', 'A'], ['11', 'A'], ['12', 'A'], ['13', 'B'], ['14', 'B'], ['15', 'A']], columns=('col1', 'col2'))
df2 = (df["col2"] == 'A').cumsum()[df["col2"] != 'A']
df3 = df.groupby(df2).apply(lambda g: list(g.index))
df3.to_list()
Upvotes: 3
Reputation: 863246
Idea is first Series.shift
values for previous A
, then create consecutive groups by Series.ne
, shift
with Series.cumsum
and pass to DataFrame.groupby
with list
:
s = df['col2'].shift(-1).ffill()
g = s.ne(s.shift()).cumsum()[s != 'A']
out = df['col1'].groupby(g).apply(list).tolist()
print (out)
[[3, 4], [6, 7, 8], [12, 13]]
Upvotes: 3