Reputation:
I am trying to subset a data frame like below :-
If the entries in "time_col" column of a dataframe are as follows:
# 00:00:00, 00:10:00,16:14:00, 22:59:00, 01:00:00, 04:14:00, 08:18:00, 18:18:00
then we need to create 2 data frames becuase the times above represent 2 different days.
# data_frame_1 -> 00:00:00, 00:10:00,16:14:00, 22:59:00
# data_frame_2 -> 01:00:00, 04:14:00, 08:18:00, 18:18:00
I am having a timeseries Data and the time which is in sequence but after the time 23:59:00 its again starting from 00:00:00 but that is for the next day.
In simple words if a column in a df is having value like below and lowest & highest values are 0 and 9:-
#1,2,3,4,5,6,7,8,9,4,5,6,7,8,1,2,4,5,8
So, I want to filter the df into below multiple dataframes like :-
#df1<- 1,2,3,4,5,6,7,8,9
#df2<- 4,5,6,7,8
#df3<- 1,2,4,5,8
I want to make this dynamic.I mean if there are 3 days or 30 days, it should work in both the case.
Upvotes: 1
Views: 58
Reputation: 8631
For df
:
times
0 00:00:00
1 00:10:00
2 16:14:00
3 22:59:00
4 01:00:00
5 04:14:00
6 08:18:00
7 18:18:00
You can do it in 2 steps as below:
df['times'] = pd.to_datetime(df.times)
df['group_label'] = df.groupby(df.times.diff().dt.days.cumsum().fillna(0)).grouper.group_info[0]
df['times'] = df.times.dt.time
list_dfs = [df.loc[df.group_label==x].drop('group_label',1) for x in df.group_label.unique()]
Output:
[0 00:00:00
1 00:10:00
2 16:14:00
3 22:59:00
Name: times, dtype: object,
4 01:00:00
5 04:14:00
6 08:18:00
7 18:18:00
Name: times, dtype: object]
You can access the list as
list_dfs[0]
times
0 00:00:00
1 00:10:00
2 16:14:00
3 22:59:00
list_dfs[1]
times
4 01:00:00
5 04:14:00
6 08:18:00
7 18:18:00
Upvotes: 0
Reputation: 323246
IIUC
s=pd.Series([1,2,3,4,5,6,7,8,9,4,5,6,7,8,1,2,4,5,8])
s.groupby(s.diff().lt(0).cumsum()).apply(list)
Out[115]:
0 [1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [4, 5, 6, 7, 8]
2 [1, 2, 4, 5, 8]
dtype: object
Upvotes: 1