Reputation: 197
I have a time series with speed and would like to detect all parts that are constant for more than a specific time. Lets say with the following data i want to detect when there was no movement for more than 2 minutes and put those parts into another dataframe (along with all other columns)
2020-02-27 15:43:00 0.000000
2020-02-27 15:43:30 0.000000
2020-02-27 15:44:00 0.000000
2020-02-27 15:44:30 0.000000
2020-02-27 15:45:00 0.000000
2020-02-27 15:45:30 0.000000
2020-02-27 15:46:00 0.000000
2020-02-27 15:46:30 0.000000
2020-02-27 15:47:00 0.000000
2020-02-27 15:47:30 0.000000
2020-02-27 15:48:00 0.000000
2020-02-27 15:48:30 0.000000
2020-02-27 15:49:00 0.000000
2020-02-27 15:49:30 0.000000
2020-02-27 15:50:00 0.000000
2020-02-27 15:50:30 0.000000
2020-02-27 15:51:00 0.000000
2020-02-27 15:51:30 0.000000
2020-02-27 15:52:00 1.004333
2020-02-27 15:52:30 2.002667
2020-02-27 15:53:00 5.001000
2020-02-27 15:53:30 6.002667
2020-02-27 15:54:00 8.001000
2020-02-27 15:54:30 4.000667
2020-02-27 15:55:00 3.000000
2020-02-27 15:55:30 0.000000
2020-02-27 15:56:00 0.000000
2020-02-27 15:56:30 0.000000
2020-02-27 15:57:00 0.000000
2020-02-27 15:57:30 0.000000
2020-02-27 15:58:00 0.000000
So then the result would be a df_constant with data from 2020-02-27 15:43:00
until 2020-02-27 15:51:30
& 2020-02-27 15:55:30
until 2020-02-27 15:58:00
Upvotes: 2
Views: 910
Reputation: 62513
datetime
column should be converted to a datetime dtype
, and then sorted on that column, but that column isn't used to determine the consecutive occurrences.val
because, as in the example, the groups of consecutive numbers are not unique (e.g. both groups are 0.0)
.ne
, .shift
, and .cumsum
are used to create a Series, where each sequence of consecutive values, is a unique value.df['val'].groupby(g).transform('count') > 4
creates a Boolean mask, which is use to select rows from df[['datetime', 'val']]
2 minute
period, the count should be >=4
because the time steps are 30 seconds
, and 5 consecutive occurrences is 2 minutesimport pandas as pd
# sample dataframe is the same as the data in the op
data = {'datetime': ['2020-02-27 15:43:00', '2020-02-27 15:43:30', '2020-02-27 15:44:00', '2020-02-27 15:44:30', '2020-02-27 15:45:00', '2020-02-27 15:45:30', '2020-02-27 15:46:00', '2020-02-27 15:46:30', '2020-02-27 15:47:00', '2020-02-27 15:47:30', '2020-02-27 15:48:00', '2020-02-27 15:48:30', '2020-02-27 15:49:00', '2020-02-27 15:49:30', '2020-02-27 15:50:00', '2020-02-27 15:50:30', '2020-02-27 15:51:00', '2020-02-27 15:51:30', '2020-02-27 15:52:00', '2020-02-27 15:52:30', '2020-02-27 15:53:00', '2020-02-27 15:53:30', '2020-02-27 15:54:00', '2020-02-27 15:54:30', '2020-02-27 15:55:00', '2020-02-27 15:55:30', '2020-02-27 15:56:00', '2020-02-27 15:56:30', '2020-02-27 15:57:00', '2020-02-27 15:57:30', '2020-02-27 15:58:00'], 'val': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.004333, 2.002667, 5.001, 6.002667, 8.001, 4.000667, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}
df = pd.DataFrame(data)
# display(df.head())
datetime val
0 2020-02-27 15:43:00 0.0
1 2020-02-27 15:43:30 0.0
2 2020-02-27 15:44:00 0.0
3 2020-02-27 15:44:30 0.0
4 2020-02-27 15:45:00 0.0
# create a Series with the same index as df, where the consecutive values are unique
g = df.val.ne(df.val.shift()).cumsum()
# use g with groupby to count the consecutive values and then create a Boolean using > 4 (will represent 2 minutes, when the time interval is 30 seconds).
consecutive_data = df[['datetime', 'val']][df['val'].groupby(g).transform('count') > 4]
display(consecutive_data)
datetime val
0 2020-02-27 15:43:00 0.0
1 2020-02-27 15:43:30 0.0
2 2020-02-27 15:44:00 0.0
3 2020-02-27 15:44:30 0.0
4 2020-02-27 15:45:00 0.0
5 2020-02-27 15:45:30 0.0
6 2020-02-27 15:46:00 0.0
7 2020-02-27 15:46:30 0.0
8 2020-02-27 15:47:00 0.0
9 2020-02-27 15:47:30 0.0
10 2020-02-27 15:48:00 0.0
11 2020-02-27 15:48:30 0.0
12 2020-02-27 15:49:00 0.0
13 2020-02-27 15:49:30 0.0
14 2020-02-27 15:50:00 0.0
15 2020-02-27 15:50:30 0.0
16 2020-02-27 15:51:00 0.0
17 2020-02-27 15:51:30 0.0
25 2020-02-27 15:55:30 0.0
26 2020-02-27 15:56:00 0.0
27 2020-02-27 15:56:30 0.0
28 2020-02-27 15:57:00 0.0
29 2020-02-27 15:57:30 0.0
30 2020-02-27 15:58:00 0.0
Upvotes: 2
Reputation: 518
import pandas as pd
from datetime import datetime
d1 = datetime.strptime("2020-02-27 15:43:00","%Y-%m-%d %H:%M:%S")
d2 = datetime.strptime('2020-02-27 15:58:00', "%Y-%m-%d %H:%M:%S")
df = pd.date_range(d1,d2, periods=30)
df = pd.DataFrame(df)
df['val'] = [0]*10 + list(range(10)) + [10]*10
df.columns = ['date','val']
def get_cont_lists(series, n):
'''
Given a list returns list of lists of indices where the values are constant
for >= n consecutive values
'''
lol = []
current_list = []
prev_value = None
for idx,elem in enumerate(series):
if elem == prev_value:
current_list.append(idx)
if elem != prev_value:
lol.append(current_list)
current_list = [idx]
prev_value = elem
lol.append(current_list)
lol = [lst for lst in lol if len(lst)>=n]
return lol
cont_lst = get_cont_lists(lst,4)
cont_lst = [i for j in cont_lst for i in j]
required_df = df.iloc[cont_lst]
print(required_df)
Upvotes: 1