Reputation: 5115
I would like to loop through df
below to find the duration of the activity where:
X
, there are two Y
values.a
turns1
for either unique values of Y
(eg., for X ==18
, a
becomes 1
for Y==13
or a
becomes 1
for Y==14
), it marks the start of the activity; if a
turns 0
for
both unique values of Y
, it marks the end of the activity(eg., for X ==18
, a
becomes 0
for Y==13
and a
becomes 0
for Y==14
). Timestamp X Y a b Type
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low # Start
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low # End
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium
14 2000-10-26 17:29:34.770 19 16 0.0 62.0 medium
15 2000-10-26 17:29:40.770 19 16 1.0 61.0 medium # Start
16 2000-10-26 17:29:46.770 19 16 1.0 60.0 medium
17 2000-10-26 17:32:01.180 19 17 1.0 25.0 low
18 2000-10-26 17:32:01.180 19 16 0.0 51.5 medium
19 2000-10-26 17:32:35.180 19 17 0.0 50.0 medium # End
reproducible example:
from datetime import *
from pandas import *
df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'medium',
19: 'medium'}})
Expected output:
Start End X
2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18
2000-10-26 17:29:40.770 2000-10-26 17:32:35.180 19
How can I do this in a for-loop so that I could further manipulate the data-frame such calculating the mean for each unique value for Y
in an activity?
Edit:
Apologies for any confusion. The activity should be determined by changing in a
values and should by continuous, ie., after one activity ends we reset and count the next. So for example the following data-frame should generate the same output:
Timestamp X Y a b Type
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium
14 2000-10-26 17:29:34.770 18 14 0.0 62.0 medium
15 2000-10-26 17:29:40.770 18 14 1.0 61.0 medium
16 2000-10-26 17:29:46.770 18 14 1.0 60.0 medium
17 2000-10-26 17:32:01.180 18 13 1.0 25.0 low
18 2000-10-26 17:32:01.180 18 14 0.0 51.5 medium
19 2000-10-26 17:32:35.180 18 13 0.0 50.0 medium
Expected output:
Start End X
2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18
2000-10-26 17:29:40.770 2000-10-26 17:32:35.180 18
Upvotes: 1
Views: 88
Reputation: 1284
Code:
from datetime import *
from pandas import *
df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'medium',
19: 'medium'}})
print(df, '\n')
Y_ended = dict()
look_for_start = True
prv_X = 0
skip_rows_till_same_X = False
for ind,row in df.iterrows():
if skip_rows_till_same_X and prv_X == row['X']:
continue
else:
skip_rows_till_same_X = False
Y_ended[row['Y']] = False if row['a'] else True
if look_for_start:
if row['a']: # Start
start_timestamp = row['Timestamp']
look_for_start = False
Y_ended = Y_ended.fromkeys(Y_ended.keys(), False)
else:
if all(Y_ended.values()):
end_timestamp = row['Timestamp']
look_for_start = True
skip_rows_till_same_X = True
print(start_timestamp, end_timestamp, row['X'])
Y_ended = dict()
prv_X = row['X']
Output:
Timestamp X Y a b Type
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium
14 2000-10-26 17:29:34.770 19 14 0.0 62.0 medium
15 2000-10-26 17:29:40.770 19 14 1.0 61.0 medium
16 2000-10-26 17:29:46.770 19 14 1.0 60.0 medium
17 2000-10-26 17:32:01.180 19 13 1.0 25.0 low
18 2000-10-26 17:32:01.180 19 14 0.0 51.5 medium
19 2000-10-26 17:32:35.180 19 13 0.0 50.0 medium
2000-10-26 10:39:24.310000 2000-10-26 17:29:18.770000 18
2000-10-26 17:29:40.770000 2000-10-26 17:32:35.180000 19
Upvotes: 1
Reputation: 14094
Here is a one possible solution, the issue is the end
can be two values based on your constraints
# Create dict for the solution
expected_dict = {'start': [], 'end': [], 'X': [], 'b_means': []}
# Loop through each group
for name, gp in df.groupby('X'):
gp['y_shift'] = gp['Y'].shift(-1).ffill().astype(int)
astart = gp[gp['a'].eq(1.0)].iloc[0]['pd.Timestamp']
# End logic that can be updated
aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp']
b_means = gp['b'].mean()
# Collect all values
expected_dict ['start'].append(astart)
expected_dict ['end'].append(aend)
expected_dict ['X'].append(name)
expected_dict ['b_means'].append(name)
edf = pd.DataFrame(expected_dict)
start end X b_means
0 2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18 37.714286
1 2000-10-26 17:29:40.770 2000-10-26 17:32:01.180 19 51.583333
Upvotes: 1