Identifying maximum value in a row, from multiple columns, with an output including all columns in the dataset?

Question

I have a fairly large dataset, and I need to determine the maximum value of each row, from several columns. So in the below sample data, for "II" what the highest value is, and if the highest value is in "N" or "P". I know very similar questions to this have been posted previously, however I need the output to not remove the other metadata columns in my dataset. This also means I need to specify the range of columns which should be included in the "max" query. Thanks in advance for any guidance with this.

data<-data_frame(Exp = c("I", "II", "III", "IV", "V", "VI", "VII", "VIII"),
                  N = c(8.77, 1.67, 7.47, 7.58, 1.1, 8.9, 7.5, 7.7),
                  P = c(1.848, 3.029, 1.925, 2.725, 1.900, 3.100, 
                                    2.000, 9.800))

I have tried several variations of the below code

test %>% 
  mutate(Max = pmax(!!! rlang::syms(names(.)[c("N", "P"),]))) %>% 
  group_by(data, Exp) %>% 
  summarise(Max = max(Max))

and receive the error: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "function"

This is my first question asked on here, so apologies for any incorrect formatting etc, any advice on this (and my question) would be much appreciated.

Simon.S.A. · Accepted Answer

I am considering this in two steps

find the max value of columns
find label that matches the max value (assume not equal values)

If you only have two columns N and P then this is straightforward to do using case_when.

data2 = data %>%
  mutate(max_val = pmax(N,P)) %>%                  # find max
  mutate(source = case_when(max_val == N ~ "N",    # find label
                            max_val == P ~ "P"))

However, if the number of columns, or the column names, is dynamic then this becomes harder. I have the following working:

cols = c("N", "P")    # list of column names to work with

data2 = data %>%
  mutate(max_val = pmax(!!!syms(cols))) %>%   # find max
  mutate(source = NA)                         # initialize blank labels

# iterate to find labels
data3 = data2
for(c in cols)
  data3 = mutate(data3, source = ifelse(is.na(source) & max_val == !!sym(c), c, source))

There is probably a way to combine sym with case_when so you do not have to iterate over the labels. If someone finds it, please post an update to this answer.

Identifying maximum value in a row, from multiple columns, with an output including all columns in the dataset?

Answers (2)

Related Questions