GenieV
GenieV

Reputation: 33

Identifying maximum value in a row, from multiple columns, with an output including all columns in the dataset?

I have a fairly large dataset, and I need to determine the maximum value of each row, from several columns. So in the below sample data, for "II" what the highest value is, and if the highest value is in "N" or "P". I know very similar questions to this have been posted previously, however I need the output to not remove the other metadata columns in my dataset. This also means I need to specify the range of columns which should be included in the "max" query. Thanks in advance for any guidance with this.

data<-data_frame(Exp = c("I", "II", "III", "IV", "V", "VI", "VII", "VIII"),
                  N = c(8.77, 1.67, 7.47, 7.58, 1.1, 8.9, 7.5, 7.7),
                  P = c(1.848, 3.029, 1.925, 2.725, 1.900, 3.100, 
                                    2.000, 9.800))

I have tried several variations of the below code

test %>% 
  mutate(Max = pmax(!!! rlang::syms(names(.)[c("N", "P"),]))) %>% 
  group_by(data, Exp) %>% 
  summarise(Max = max(Max))

and receive the error: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "function"

This is my first question asked on here, so apologies for any incorrect formatting etc, any advice on this (and my question) would be much appreciated.

Upvotes: 1

Views: 916

Answers (2)

Hachiloni
Hachiloni

Reputation: 131

Looking to solve the same problem I found a different solution, that is clearer to me.
cur_data returns the current working group.
rowwise can have columns specified which work like groups while using summarise.
ungroup is needed to revert to the default column-wise format.
The summarise method drops the non-grouping variables.

# using names    
v = c('N', 'P')
data %>% rowwise %>% mutate(m=max(cur_data()[v])) %>% ungroup

# using ranges
start = 8
end = 25
data %>% rowwise %>% mutate(m=max(cur_data()[start:end])) %>% ungroup

# using summarize
data %>% rowwise(Exp) %>% summarize(m=max(cur_data()))

Upvotes: 0

Simon.S.A.
Simon.S.A.

Reputation: 6941

I am considering this in two steps

  1. find the max value of columns
  2. find label that matches the max value (assume not equal values)

If you only have two columns N and P then this is straightforward to do using case_when.

data2 = data %>%
  mutate(max_val = pmax(N,P)) %>%                  # find max
  mutate(source = case_when(max_val == N ~ "N",    # find label
                            max_val == P ~ "P"))

However, if the number of columns, or the column names, is dynamic then this becomes harder. I have the following working:

cols = c("N", "P")    # list of column names to work with

data2 = data %>%
  mutate(max_val = pmax(!!!syms(cols))) %>%   # find max
  mutate(source = NA)                         # initialize blank labels

# iterate to find labels
data3 = data2
for(c in cols)
  data3 = mutate(data3, source = ifelse(is.na(source) & max_val == !!sym(c), c, source))

There is probably a way to combine sym with case_when so you do not have to iterate over the labels. If someone finds it, please post an update to this answer.

Upvotes: 1

Related Questions