Reputation: 33
I have a fairly large dataset, and I need to determine the maximum value of each row, from several columns. So in the below sample data, for "II" what the highest value is, and if the highest value is in "N" or "P". I know very similar questions to this have been posted previously, however I need the output to not remove the other metadata columns in my dataset. This also means I need to specify the range of columns which should be included in the "max" query. Thanks in advance for any guidance with this.
data<-data_frame(Exp = c("I", "II", "III", "IV", "V", "VI", "VII", "VIII"),
N = c(8.77, 1.67, 7.47, 7.58, 1.1, 8.9, 7.5, 7.7),
P = c(1.848, 3.029, 1.925, 2.725, 1.900, 3.100,
2.000, 9.800))
I have tried several variations of the below code
test %>%
mutate(Max = pmax(!!! rlang::syms(names(.)[c("N", "P"),]))) %>%
group_by(data, Exp) %>%
summarise(Max = max(Max))
and receive the error: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "function"
This is my first question asked on here, so apologies for any incorrect formatting etc, any advice on this (and my question) would be much appreciated.
Upvotes: 1
Views: 916
Reputation: 131
Looking to solve the same problem I found a different solution, that is clearer to me.
cur_data
returns the current working group.
rowwise
can have columns specified which work like groups while using summarise.
ungroup
is needed to revert to the default column-wise format.
The summarise method drops the non-grouping variables.
# using names
v = c('N', 'P')
data %>% rowwise %>% mutate(m=max(cur_data()[v])) %>% ungroup
# using ranges
start = 8
end = 25
data %>% rowwise %>% mutate(m=max(cur_data()[start:end])) %>% ungroup
# using summarize
data %>% rowwise(Exp) %>% summarize(m=max(cur_data()))
Upvotes: 0
Reputation: 6941
I am considering this in two steps
If you only have two columns N
and P
then this is straightforward to do using case_when
.
data2 = data %>%
mutate(max_val = pmax(N,P)) %>% # find max
mutate(source = case_when(max_val == N ~ "N", # find label
max_val == P ~ "P"))
However, if the number of columns, or the column names, is dynamic then this becomes harder. I have the following working:
cols = c("N", "P") # list of column names to work with
data2 = data %>%
mutate(max_val = pmax(!!!syms(cols))) %>% # find max
mutate(source = NA) # initialize blank labels
# iterate to find labels
data3 = data2
for(c in cols)
data3 = mutate(data3, source = ifelse(is.na(source) & max_val == !!sym(c), c, source))
There is probably a way to combine sym
with case_when
so you do not have to iterate over the labels. If someone finds it, please post an update to this answer.
Upvotes: 1