Reputation: 4130
I have a dataframe like this.
store daiban signal ... start_time end_time mask
0 0901 0001 0 ... 2020-03-31 00:00:00 2020-03-31 00:35:00 0
1 0901 0001 1 ... 2020-03-31 00:35:00 2020-03-31 00:36:40 1
2 0901 0001 2 ... 2020-03-31 00:36:40 2020-03-31 00:38:44 1
3 0901 0001 0 ... 2020-03-31 00:38:44 2020-03-31 01:10:40 0
4 0901 0001 1 ... 2020-03-31 01:10:40 2020-03-31 01:12:24 1
5 0901 0001 2 ... 2020-03-31 01:12:24 2020-03-31 01:13:40 1
6 0901 0001 1 ... 2020-03-31 01:13:40 2020-03-31 01:15:04 1
7 0901 0001 2 ... 2020-03-31 01:15:04 2020-03-31 01:17:00 1
8 0901 0001 0 ... 2020-03-31 01:17:00 2020-03-31 02:33:04 0
9 0901 0001 1 ... 2020-03-31 02:33:04 2020-03-31 02:34:52 1
10 0901 0001 2 ... 2020-03-31 02:34:52 2020-03-31 02:37:28 1
I want to get all rows between zero(0) of mask column and get first row's start_time
and last row's end_time
of each group.
for example
1) first group will be index 1 to 2.
1 0901 0001 1 ... 2020-03-31 00:35:00 2020-03-31 00:36:40 1
2 0901 0001 2 ... 2020-03-31 00:36:40 2020-03-31 00:38:44 1
2) get first row's start_time
and last row's end_time
of each group
0 0901 0001 2020-03-31 00:35:00 2020-03-31 00:38:44
expected output
store daiban start_time end_time
0 0901 0001 2020-03-31 00:35:00 2020-03-31 00:38:44
1 0901 0001 2020-03-31 01:10:40 2020-03-31 01:17:00
2 0901 0001 2020-03-31 02:33:04 2020-03-31 02:37:28
dataframe for reproduce the example
from pandas import Timestamp
df = pd.DataFrame.from_dict({'store': {0: '0901',
1: '0901',
2: '0901',
3: '0901',
4: '0901',
5: '0901',
6: '0901',
7: '0901',
8: '0901',
9: '0901',
10: '0901'},
'daiban': {0: '0001',
1: '0001',
2: '0001',
3: '0001',
4: '0001',
5: '0001',
6: '0001',
7: '0001',
8: '0001',
9: '0001',
10: '0001'},
'signal': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 1, 7: 2, 8: 0, 9: 1, 10: 2},
'cum_sum': {0: 525,
1: 25,
2: 31,
3: 479,
4: 26,
5: 19,
6: 21,
7: 29,
8: 1141,
9: 27,
10: 39},
'seconds': {0: 2100,
1: 100,
2: 124,
3: 1916,
4: 104,
5: 76,
6: 84,
7: 116,
8: 4564,
9: 108,
10: 156},
'start_time': {0: Timestamp('2020-03-31 00:00:00'),
1: Timestamp('2020-03-31 00:35:00'),
2: Timestamp('2020-03-31 00:36:40'),
3: Timestamp('2020-03-31 00:38:44'),
4: Timestamp('2020-03-31 01:10:40'),
5: Timestamp('2020-03-31 01:12:24'),
6: Timestamp('2020-03-31 01:13:40'),
7: Timestamp('2020-03-31 01:15:04'),
8: Timestamp('2020-03-31 01:17:00'),
9: Timestamp('2020-03-31 02:33:04'),
10: Timestamp('2020-03-31 02:34:52')},
'end_time': {0: Timestamp('2020-03-31 00:35:00'),
1: Timestamp('2020-03-31 00:36:40'),
2: Timestamp('2020-03-31 00:38:44'),
3: Timestamp('2020-03-31 01:10:40'),
4: Timestamp('2020-03-31 01:12:24'),
5: Timestamp('2020-03-31 01:13:40'),
6: Timestamp('2020-03-31 01:15:04'),
7: Timestamp('2020-03-31 01:17:00'),
8: Timestamp('2020-03-31 02:33:04'),
9: Timestamp('2020-03-31 02:34:52'),
10: Timestamp('2020-03-31 02:37:28')},
'mask': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1, 10: 1}})
Upvotes: 2
Views: 386
Reputation: 323376
IIUC we use cumsum
and filter
created he dataframe then use agg
df=df.loc[df['mask'].ne(0)].groupby([df['mask'].eq(0).cumsum(),df.store,df.daiban]).\
agg({'start_time':'first','end_time':'last'}).reset_index(level=[1,2])
mask store daiban start_time end_time
0 1 0901 0001 2020-03-31 00:35:00 2020-03-31 00:38:44
1 2 0901 0001 2020-03-31 01:10:40 2020-03-31 01:17:00
2 3 0901 0001 2020-03-31 02:33:04 2020-03-31 02:37:28
Upvotes: 1
Reputation: 4284
The key here is to create a group
column for each 1 groups
This can be done with cumsum
df['group'] = (1-df['mask']).cumsum()
Then keep only rows where mask
is 1
df_filter = df.loc[df['mask']==1]
You can then group by group, store and daiban to get one row by group and compute min start_time and max end_time in each group.
df_filter.groupby(['group','store','daiban'])\
.agg({'start_time': 'min', 'end_time': 'max'})\
.reset_index()\
.drop('group',axis=1)
Upvotes: 1