Silent
Silent

Reputation: 25

Python: How to find event occurences in data?

Is there an easy and efficient way to find specific events in a data series? With an event I mean specific conditions in the data such as spikes, passing over/under a threshold or data series crossing over etc.

I basically have two goals: 1) Compare the data around an event with data around the next/previous event in order to analyze how they compare and adaptations have impacted the event. 2) Copy key data around all the events in the data to a new data frame for statistical analysis.

In my mind, I want to loop sequentially through the events and obtain the index value of the events so I can process the data around it.

Obviously I could choose to loop through all the data, but I suspect their should be a more efficient approach. Any pointers on how best to approach this?

Upvotes: 0

Views: 1253

Answers (1)

mr_mo
mr_mo

Reputation: 1528

I would do something as follows:

# Lets use numpy (you can do the same with pandas or any other algebra package
import numpy as np

# Just generate some data for the example
data = np.array([1,2,3,3,2,1]) 

# Lets say we are looking for a period that data is greater than 2.
# First, we indicate all those points
indicators = (data > 2).astype(int) # now we have [0 0 1 1 0 0]

# We differentiate that so we will have non-zero wherever data > 2.
# Note that we concatenate 0 at the beginning.
indicators_diff = np.concatenate([[0],indicators[1:] - indicators[:-1]])

# Now lets seek for those indices
diff_locations = np.where(indicators_diff != 0)[0]

# We are resulting in all places that the derivative is non-zero.
# Those are indices of start and end of events:
# [event1_start, event1_end, event2_start, ....]
# So we choose by filtering odd/even places of the resulted vector
events_starts_list = diff_locations[::2].tolist()
events_ends_list = diff_locations[1::2].tolist()

# And now we can also gather the events data by iterating the events.
event_data_list = []

for event_start, event_end in zip(events_starts_list, events_ends_list):
     event_data_list.append(data[event_start:event_end])

Since this code uses numpy backend written in C to run most of the loops, it runs extremly quick. I use it all the time for a quick solution.

Good luck!

Edit: Added some comments for clarity.
Note: You may also want to handle special cases such as if the final event is at the end of the data. It may occur that you have an odd number of elements in the diff_locations variable. If it is odd, just decide on an index (e.g. the last) and add it to this list before separating to events_starts_list and events_ends_list.

Upvotes: 1

Related Questions