user10177566
user10177566

Reputation:

filtering data when the next value in a Time-series is bigger

I am trying to subset a data frame like below :-

If the entries in "time_col" column of a dataframe are as follows:

# 00:00:00, 00:10:00,16:14:00, 22:59:00, 01:00:00, 04:14:00, 08:18:00, 18:18:00

then we need to create 2 data frames becuase the times above represent 2 different days.

# data_frame_1 -> 00:00:00, 00:10:00,16:14:00, 22:59:00
# data_frame_2 -> 01:00:00, 04:14:00, 08:18:00, 18:18:00

I am having a timeseries Data and the time which is in sequence but after the time 23:59:00 its again starting from 00:00:00 but that is for the next day.

------------------------------------------------------------------------------

In simple words if a column in a df is having value like below and lowest & highest values are 0 and 9:-

#1,2,3,4,5,6,7,8,9,4,5,6,7,8,1,2,4,5,8

So, I want to filter the df into below multiple dataframes like :-

#df1<- 1,2,3,4,5,6,7,8,9
#df2<- 4,5,6,7,8
#df3<- 1,2,4,5,8

I want to make this dynamic.I mean if there are 3 days or 30 days, it should work in both the case.

Upvotes: 1

Views: 58

Answers (2)

harpan
harpan

Reputation: 8631

For df:

       times
0   00:00:00
1   00:10:00
2   16:14:00
3   22:59:00
4   01:00:00
5   04:14:00
6   08:18:00
7   18:18:00

You can do it in 2 steps as below:

df['times'] = pd.to_datetime(df.times)
df['group_label'] = df.groupby(df.times.diff().dt.days.cumsum().fillna(0)).grouper.group_info[0]
df['times'] = df.times.dt.time
list_dfs = [df.loc[df.group_label==x].drop('group_label',1) for x in df.group_label.unique()]

Output:

 [0    00:00:00
 1    00:10:00
 2    16:14:00
 3    22:59:00
 Name: times, dtype: object, 
 4    01:00:00
 5    04:14:00
 6    08:18:00
 7    18:18:00
 Name: times, dtype: object]

You can access the list as

list_dfs[0]

    times
0   00:00:00
1   00:10:00
2   16:14:00
3   22:59:00

list_dfs[1]

    times
4   01:00:00
5   04:14:00
6   08:18:00
7   18:18:00

Upvotes: 0

BENY
BENY

Reputation: 323246

IIUC

s=pd.Series([1,2,3,4,5,6,7,8,9,4,5,6,7,8,1,2,4,5,8])
s.groupby(s.diff().lt(0).cumsum()).apply(list)
Out[115]: 
0    [1, 2, 3, 4, 5, 6, 7, 8, 9]
1                [4, 5, 6, 7, 8]
2                [1, 2, 4, 5, 8]
dtype: object

Upvotes: 1

Related Questions