HermanK
HermanK

Reputation: 309

Pandas count per time (XX:XX:XX) based on condition

With the following df:

Real_time SName VT  on Act_win ratio
11:11:20    x   5   yes zoom    90
11:11:30    x   15  yes chrome  90
11:11:40    x   25  yes zoom    90
11:11:50    x   35  no  zoom    90
11:12:00    x   45  yes zoom    90
11:12:10    x   55  yes zoom    90
11:12:20    x   65  yes zoom    50
11:12:30    x   75  no  zoom    60
11:12:40    x   85  yes chrome  60
11:12:50    x   95  yes zoom    60
11:13:00    x   105 yes zoom    20
11:11:20    y   5   yes zoom    99
11:11:30    y   15  yes zoom    88
11:11:40    y   25  yes chrome  77
11:11:50    y   35  no  chrome  49
11:12:00    y   45  yes zoom    56
11:12:10    y   55  yes zoom    46
11:12:20    y   65  yes zoom    90
11:12:30    y   75  no  zoom    90
11:12:40    y   85  yes zoom    90
11:12:50    y   95  yes zoom    90
11:13:00    y   105 yes chrome  50

I need to group by timestamp and act_win and count instances in which df["act_win"] = "zoom" per timestamp, where 0 occurrences should be counted as 0 (and not have the row dropped).

In addition, I would like to use "mean" for "ratio" on the same condition.

Lastly, I need to do the same thing with multiple conditions, given df["act_win"] = "zoom" and df["on"] = "yes".

I got stuck because I just do a total count and then drop unnecessary rows. However if there where no "zoom" entries in a specific timestamp, it won't work. Unfortunately the answers I have found so far create a factorial column for months which doesn't help me in this situation.

EDIT - expected result for final combined criteria (both df["on"] = "yes" and df["act_win"] = "zoom"):

Real_time count mean_r
11:11:20    2   94.5
11:11:30    1   88
11:11:40    1   90
11:11:50    0   0
11:12:00    2   73
11:12:10    2   68
11:12:20    2   70
11:12:30    0   0
11:12:40    1   90
11:12:50    2   75
11:13:00    1   20

Upvotes: 1

Views: 135

Answers (1)

r-beginners
r-beginners

Reputation: 35115

There may be a better way to do this, but it is calculated by grouping the results against the conditional extraction results. Then I create a data frame from it with a unique timestamp and combine it with

df1 = df[(df['Act_win'] == 'zoom')&(df['on'] == 'yes')].groupby(['Real_time','Act_win']).agg(count=('ratio','count'), mean_r=('ratio','mean'))
df1.reset_index(inplace=True)
df2 = pd.DataFrame({'Real_time':df['Real_time'].unique().tolist()})
df2 = df2.merge(df1, on='Real_time', how='outer')
df2['Act_win'] = df2['Act_win'].fillna(method='ffill')
df2.fillna(0, inplace=True)

df2
    Real_time   Act_win count   mean_r
0   11:11:20    zoom    2.0 94.5
1   11:11:30    zoom    1.0 88.0
2   11:11:40    zoom    1.0 90.0
3   11:11:50    zoom    0.0 0.0
4   11:12:00    zoom    2.0 73.0
5   11:12:10    zoom    2.0 68.0
6   11:12:20    zoom    2.0 70.0
7   11:12:30    zoom    0.0 0.0
8   11:12:40    zoom    1.0 90.0
9   11:12:50    zoom    2.0 75.0
10  11:13:00    zoom    1.0 20.0

Upvotes: 1

Related Questions