Reputation: 65
I have a large dataframe (1.5mln,13) and I want to retrieve the index of all the first occurences of grouped events.
The events are repeating in groups of varying lenghts like in my example date.
How can I get a list with all the first 'a' events, and all the first 'b' events?
Example data:
data = {'event': ['a','a','a','a','a','b','b','b','b','a','a','a','b','b','b','b','b','a','a','a','b','b','b','b']}
df = pd.DataFrame (data, columns = ['event'])
Upvotes: 0
Views: 42
Reputation: 30991
As I understood, you want the first row from a sequence of consecutive rows with the same value in event column.
The code to get this result is:
df[df.event != df.event.shift()]
(compare the current value with the previous, looking for "different" cases, then use this intermediate result in boolean indexing).
For your data sample the result is:
event
0 a
5 b
9 a
12 b
17 a
20 b
Upvotes: 2