ail
ail

Reputation: 149

Get maximum value from dataset based aggregate number from another column

I have the dataset as follows, which group by ID:

ID, Activity, Duration
1, Reading,     20
1, Work,        40
1, Reading,     30
2, Home,        50
2, Writing,     30
2, Reading,     20
2, Writing,     30

And I want to get another column that tells us the activity with the highest duration, so person 1 should be reading because it takes 50 minutes, and person 2 should be writing because it takes 60 minutes. Below is an example of the desired output.

ID, Activity, Duration, Max_Actitvity
    1, Reading, 20, Reading
    1, Work,    40, Reading
    1, Reading, 30, Reading
    2, Home,    50, Writing
    2, Writing, 30, Writing
    2, Reading, 20, Writing
    2, Writing, 30, Writing

Upvotes: 1

Views: 58

Answers (1)

Quinten
Quinten

Reputation: 41603

You can use the following code:

df <- read.table(text = "ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30", header = TRUE, sep = ",")

library(dplyr)
df %>%
  group_by(ID, Activity) %>%
  mutate(sum_Activity = sum(Duration)) %>%
  group_by(ID) %>%
  mutate(Max_Activity = Activity[which.max(sum_Activity)]) %>%
  select(-sum_Activity) %>%
  ungroup() 
#> # A tibble: 7 × 4
#>      ID Activity   Duration Max_Activity
#>   <int> <chr>         <int> <chr>       
#> 1     1 " Reading"       20 " Reading"  
#> 2     1 " Work"          40 " Reading"  
#> 3     1 " Reading"       30 " Reading"  
#> 4     2 " Home"          50 " Writing"  
#> 5     2 " Writing"       30 " Writing"  
#> 6     2 " Reading"       20 " Writing"  
#> 7     2 " Writing"       30 " Writing"

Created on 2022-07-21 by the reprex package (v2.0.1)

Upvotes: 3

Related Questions