B. Davis
B. Davis

Reputation: 3441

Select a maximum value across rows and columns with grouped data

The data below have an IndID field as well as three columns containing numbers, including NA in some instances, with a varying number of rows for each IndID.

library(dplyr)
n = 10
set.seed(123)
dat <- data.frame(IndID = sample(c("AAA", "BBB", "CCC", "DDD"), n, replace = T),
                  Num1 = c(2,4,2,4,4,1,3,4,3,2),
                  Num2 = sample(c(1,2,5,8,7,8,NA), n, replace = T),
                  Num3 = sample(c(NA, NA,NA,8,7,9,NA), n, replace = T)) %>%
  arrange(IndID)

 head(dat)
  IndID Num1 Num2 Num3
1   AAA    1   NA    7
2   BBB    2   NA   NA
3   BBB    2    7    7
4   BBB    2   NA   NA
5   CCC    3    2    8
6   CCC    3    5   NA

For each IndID, I would like to make a new column Max that contains the maximum value for Num1:Num3. In most instances this involves finding the max value across multiple rows and columns. Within dplyr I am missing the final step (below) and would appreciate any suggestions.

dat %>%
  group_by(IndID) %>%
  mutate(Max = "???")

Upvotes: 1

Views: 1007

Answers (1)

akrun
akrun

Reputation: 886938

An option is pmax to get the rowwise maxs

dat %>%        
   mutate(Max = pmax(Num1, Num2, Num3, na.rm = TRUE))

If there are many columns, we can get the column names, convert it to symbol and then evaluate (!!!)

dat %>%        
    mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE))
# A tibble: 10 x 5
# Groups:   IndID [4]
#   IndID  Num1  Num2  Num3   Max
#   <fct> <dbl> <dbl> <dbl> <dbl>
# 1 AAA       1    NA     7     7
# 2 BBB       2    NA    NA     2
# 3 BBB       2     7     7     7
# 4 BBB       2    NA    NA     2
# 5 CCC       3     2     8     8
# 6 CCC       3     5    NA     5
# 7 DDD       4     8     7     8
# 8 DDD       4     7    NA     7
# 9 DDD       4     1     7     7
#10 DDD       4     1     7     7

If this is to get the max of all 'Num' column grouped by 'IndID', there are multiple ways.

1) From the above step, we can extend it to group by 'IndID' and then take the max of row maxs ('Max')

dat %>%        
    mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE)) %>% 
    group_by(IndID) %>% 
    mutate(Max = max(Max))

2) Another option is to convert the 'wide' format to 'long' with gather, then grouped by 'IndID', get the max of 'val' column and right_join with the original dataset

library(tidyverse)
gather(dat, key, val, -IndID) %>% 
    group_by(IndID) %>% 
    summarise(Max = max(val,na.rm = TRUE)) %>% 
    right_join(dat)

3) Or another option without reshaping into 'long' format would be to nest the dataset after grouping by 'IndID', unlist and get the max of the 'Num' columns

dat %>%
   group_by(IndID) %>%
   nest %>%
   mutate(data =  map(data,  ~ .x %>% 
                                mutate(Max = max(unlist(.), na.rm = TRUE)))) %>% 
   unnest

Upvotes: 4

Related Questions