Reputation: 149
I have the dataset as follows, which group by ID:
ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30
And I want to get another column that tells us the activity with the highest duration, so person 1 should be reading because it takes 50 minutes, and person 2 should be writing because it takes 60 minutes. Below is an example of the desired output.
ID, Activity, Duration, Max_Actitvity
1, Reading, 20, Reading
1, Work, 40, Reading
1, Reading, 30, Reading
2, Home, 50, Writing
2, Writing, 30, Writing
2, Reading, 20, Writing
2, Writing, 30, Writing
Upvotes: 1
Views: 58
Reputation: 41603
You can use the following code:
df <- read.table(text = "ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30", header = TRUE, sep = ",")
library(dplyr)
df %>%
group_by(ID, Activity) %>%
mutate(sum_Activity = sum(Duration)) %>%
group_by(ID) %>%
mutate(Max_Activity = Activity[which.max(sum_Activity)]) %>%
select(-sum_Activity) %>%
ungroup()
#> # A tibble: 7 × 4
#> ID Activity Duration Max_Activity
#> <int> <chr> <int> <chr>
#> 1 1 " Reading" 20 " Reading"
#> 2 1 " Work" 40 " Reading"
#> 3 1 " Reading" 30 " Reading"
#> 4 2 " Home" 50 " Writing"
#> 5 2 " Writing" 30 " Writing"
#> 6 2 " Reading" 20 " Writing"
#> 7 2 " Writing" 30 " Writing"
Created on 2022-07-21 by the reprex package (v2.0.1)
Upvotes: 3