Reputation: 131
Really simple question, but somehow i am stuck. I have panel data of users daily tasks. Now i want to find out how many tasks one user does on average, but somehow i have no idea how. And how long one user on average takes per task. Also, i would like to plot this data if possible. I did the normal descriptives, but i feel like it is not exactly what i need. The data looks somewhat like this user (1, 1, 1, 2, 2,3) task( 1, 1,2, 3,4, 5) day( 1, 2, 1,1,2,1) task creation (1,1,1,4,4,3) deadline(5,5,5,9,9,4)
id_task id_user day completion_yesno day_created has_deadline deadline created_before active overdue completed_before
16416 37033 5272 61 0 61 1 172 0 0 0 0
16417 37033 5272 62 0 61 1 172 2 2 0 0
16418 37033 5272 63 0 61 1 172 2 2 0 0
16419 37033 5272 64 0 61 1 172 2 2 0 0
16420 37033 5272 65 0 61 1 172 2 2 0 0
16421 37033 5272 66 0 61 1 172 2 2 0 0
16422 37033 5272 67 0 61 1 172 2 2 0 0
16423 37033 5272 68 0 61 1 172 2 2 0 0
16424 37033 5272 69 0 61 1 172 2 2 0 0
16425 37033 5272 70 0 61 1 172 2 2 0 0
16426 37033 5272 71 0 61 1 172 2 2 0 0
16427 37033 5272 72 0 61 1 172 2 2 0 0
16428 37033 5272 73 0 61 1 172 2 2 0 0
16429 37033 5272 74 0 61 1 172 2 2 0 0
16430 37033 5272 75 0 61 1 172 2 2 0 0
16431 37033 5272 76 0 61 1 172 2 2 0 0
16432 37033 5272 77 0 61 1 172 2 2 0 0
16433 37033 5272 78 0 61 1 172 2 2 0 0
16434 37033 5272 79 0 61 1 172 2 2 0 0
16435 37033 5272 80 0 61 1 172 2 2 0 0
In this case one user would work on 2 tasks on average, but i just found it out through counting.
Upvotes: 0
Views: 73
Reputation: 11399
Keep only information on user, task and completed. Remove duplicated lines, then group by user and compute the number of completed tasks for each user:
df_by_user <- df %>%
select(id_user, id_task, completion_yesno) %>%
unique() %>%
group_by(id_user) %>%
summarise(n = sum(completion_yesno))
Then compute the average:
mean(df_by_user$n)
Upvotes: 2