Reputation: 5270
I am trying to get the index of the column that has the highest value among selected columns. When trying with dplyr
, my attempts are not giving me the right result.
library(dplyr);library(magrittr)
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
mutate(max_ind = which.max(c(.$Factor1,.$Factor2,.$Factor3))) %>% print
Factor1 Factor2 Factor3 max_ind
1 1 3 9 7
2 2 1 1 7
3 4 1 0 7
Where is the mistake? Why is dplyr
behaving this way. I should probably use rowwise
, but that does not seem to be the best way to go. Any thought of how to do this in base
, tidyverse
or data.table
?
With sapply I am getting this:
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
+ mutate(max_ind = which.max(c(Factor1,Factor2,Factor3)),
+ max_ind2 = sapply(X = ., function(x) which.max(c(x[Factor1],x[Factor2],x[Factor3])))) %>% print
Factor1 Factor2 Factor3 max_ind max_ind2
1 1 3 9 7 4
2 2 1 1 7 1
3 4 1 0 7 1
But here I see 4 in the first row while it should be 3.
I am also looking for a solution where we can specify the columns to be used for the comparison (which.max
)
All of base
, purrr::map
and dplyr::mutate
examples work.
#R>DF1 <- data.frame(Factor1 = c(1,2,4,1),Factor2 = c(3,1,1,6),Factor3 = c(9,1,0,4))
#R>DF1 %>% mutate(max_ind_purrr = pmap(.l = list(Factor1,Factor2,Factor3),~which.max(c(...)))) %>% print()
Factor1 Factor2 Factor3 max_ind_purrr
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2
#R>DF1 %>% mutate(max_ind_dplyr=max.col(DF1[,1:3]))
Factor1 Factor2 Factor3 max_ind_dplyr
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2
#R>DF1 <- transform(DF1,max_ind_base=apply(DF1[, c('Factor1','Factor2','Factor3')],1,which.max))%>% print
Factor1 Factor2 Factor3 max_ind_base
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
4 1 6 4 2
Upvotes: 1
Views: 2316
Reputation: 72813
In base R you could do:
DF1 <- transform(DF1, max_ind=apply(DF1, 1, which.max))
However, as wisely pointed out by @DavidArenburg in comments - there's actually the vectorized approach max.col()
.
DF1 <- transform(DF1, max_ind=max.col(DF1))
# Factor1 Factor2 Factor3 max_ind
# Factor1 1 3 9 3
# Factor2 2 1 1 1
# Factor3 4 1 0 1
To get the maximum of specified column names, just do this accordingly on a subset.
DF1 <- transform(DF1, max_ind_subset=max.col(DF1[c("Factor1", "Factor2")]))
# Factor1 Factor2 Factor3 max_ind_subset
# 1 1 3 9 2
# 2 2 1 1 1
# 3 4 1 0 1
DF1 <- structure(list(Factor1 = c(1, 2, 4), Factor2 = c(3, 1, 1), Factor3 = c(9,
1, 0)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 4
Reputation: 3986
I think you are asking for row-wise comparisons to find the column index that contains the maximum value for that row. This is why sapply is not working as, by default, it will look down the columns. which.max
also deals with vectors - in your case you don't want to return the index within each vector because that refers to the column vector and not the row of the data.frame.
This is basically the difference between the max
function and the pmax
function. A row-wise version of which.max
is max.col
so you could specify:
DF1 %>% mutate(max_ind=max.col(DF1))
You can then choose which columns to specify:
# only considering columns 1 and 2
DF1 %>% mutate(max_ind=max.col(DF1[,1:2]))
Upvotes: 5
Reputation: 3830
Try this using purrr::pmap
:
DF1 <-
data.frame(
Factor1 = c(1, 2, 4),
Factor2 = c(3, 1, 1),
Factor3 = c(9, 1, 0)
) %>%
mutate(max_ind = pmap_int(list(Factor1, Factor2, Factor3), ~which.max(c(...))))
Output:
Factor1 Factor2 Factor3 max_ind
1 1 3 9 3
2 2 1 1 1
3 4 1 0 1
Upvotes: 2