martsc1
martsc1

Reputation: 65

How to get all 'first' instances of grouped and recurring values?

I have a large dataframe (1.5mln,13) and I want to retrieve the index of all the first occurences of grouped events.

The events are repeating in groups of varying lenghts like in my example date.

How can I get a list with all the first 'a' events, and all the first 'b' events?

Example data:

data = {'event':  ['a','a','a','a','a','b','b','b','b','a','a','a','b','b','b','b','b','a','a','a','b','b','b','b']}
df = pd.DataFrame (data, columns = ['event'])

Upvotes: 0

Views: 42

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

As I understood, you want the first row from a sequence of consecutive rows with the same value in event column.

The code to get this result is:

df[df.event != df.event.shift()]

(compare the current value with the previous, looking for "different" cases, then use this intermediate result in boolean indexing).

For your data sample the result is:

   event
0      a
5      b
9      a
12     b
17     a
20     b

Upvotes: 2

Related Questions