Stat-R
Stat-R

Reputation: 5270

R: For each row, find the column-index of the column that has the highest value

I am trying to get the index of the column that has the highest value among selected columns. When trying with dplyr, my attempts are not giving me the right result.

library(dplyr);library(magrittr)
DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>% 
    mutate(max_ind = which.max(c(.$Factor1,.$Factor2,.$Factor3))) %>% print
          Factor1 Factor2 Factor3 max_ind
        1       1       3       9       7
        2       2       1       1       7
        3       4       1       0       7

Where is the mistake? Why is dplyr behaving this way. I should probably use rowwise, but that does not seem to be the best way to go. Any thought of how to do this in base, tidyverse or data.table?

Edit-1 (some other attempt)

With sapply I am getting this:

DF1 <- data.frame(Factor1 = c(1,2,4),Factor2 = c(3,1,1),Factor3 = c(9,1,0)) %>%
+   mutate(max_ind = which.max(c(Factor1,Factor2,Factor3)),
+          max_ind2 = sapply(X = ., function(x) which.max(c(x[Factor1],x[Factor2],x[Factor3])))) %>% print
  Factor1 Factor2 Factor3 max_ind max_ind2
1       1       3       9       7        4
2       2       1       1       7        1
3       4       1       0       7        1

But here I see 4 in the first row while it should be 3.

Edit-2

I am also looking for a solution where we can specify the columns to be used for the comparison (which.max)

Edit-3

All of base, purrr::map and dplyr::mutate examples work.

#R>DF1 <- data.frame(Factor1 = c(1,2,4,1),Factor2 = c(3,1,1,6),Factor3 = c(9,1,0,4)) 
#R>DF1 %>% mutate(max_ind_purrr = pmap(.l = list(Factor1,Factor2,Factor3),~which.max(c(...)))) %>% print()
  Factor1 Factor2 Factor3 max_ind_purrr
1       1       3       9             3
2       2       1       1             1
3       4       1       0             1
4       1       6       4             2
#R>DF1 %>% mutate(max_ind_dplyr=max.col(DF1[,1:3]))
  Factor1 Factor2 Factor3 max_ind_dplyr
1       1       3       9             3
2       2       1       1             1
3       4       1       0             1
4       1       6       4             2
#R>DF1 <- transform(DF1,max_ind_base=apply(DF1[, c('Factor1','Factor2','Factor3')],1,which.max))%>% print
  Factor1 Factor2 Factor3 max_ind_base
1       1       3       9            3
2       2       1       1            1
3       4       1       0            1
4       1       6       4            2

Upvotes: 1

Views: 2316

Answers (3)

jay.sf
jay.sf

Reputation: 72813

In base R you could do:

DF1 <- transform(DF1, max_ind=apply(DF1, 1, which.max))

However, as wisely pointed out by @DavidArenburg in comments - there's actually the vectorized approach max.col().

DF1 <- transform(DF1, max_ind=max.col(DF1))
#         Factor1 Factor2 Factor3 max_ind
# Factor1       1       3       9       3
# Factor2       2       1       1       1
# Factor3       4       1       0       1

To get the maximum of specified column names, just do this accordingly on a subset.

DF1 <- transform(DF1, max_ind_subset=max.col(DF1[c("Factor1", "Factor2")]))
#   Factor1 Factor2 Factor3 max_ind_subset
# 1       1       3       9              2
# 2       2       1       1              1
# 3       4       1       0              1

Data

DF1 <- structure(list(Factor1 = c(1, 2, 4), Factor2 = c(3, 1, 1), Factor3 = c(9, 
1, 0)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 4

Chris
Chris

Reputation: 3986

I think you are asking for row-wise comparisons to find the column index that contains the maximum value for that row. This is why sapply is not working as, by default, it will look down the columns. which.max also deals with vectors - in your case you don't want to return the index within each vector because that refers to the column vector and not the row of the data.frame.

This is basically the difference between the max function and the pmax function. A row-wise version of which.max is max.col so you could specify:

DF1 %>% mutate(max_ind=max.col(DF1))

You can then choose which columns to specify:

# only considering columns 1 and 2
DF1 %>% mutate(max_ind=max.col(DF1[,1:2]))

Upvotes: 5

Jack Brookes
Jack Brookes

Reputation: 3830

Try this using purrr::pmap:

DF1 <-
  data.frame(
    Factor1 = c(1, 2, 4),
    Factor2 = c(3, 1, 1),
    Factor3 = c(9, 1, 0)
  ) %>%
  mutate(max_ind = pmap_int(list(Factor1, Factor2, Factor3), ~which.max(c(...))))

Output:

  Factor1 Factor2 Factor3 max_ind
1       1       3       9       3
2       2       1       1       1
3       4       1       0       1

Upvotes: 2

Related Questions