teogj
teogj

Reputation: 343

R: New column storing the name of any of the already existing ones based on condition

Say if we have a dataframe looking like this below:

a    b     c     d
22   18    25    9
12   24    6     18
37   8     22    25
24   19    12    27

I would like to create two new columns out of these ones: a) One column storing the name of the column in which each row gets its highest value. b) Another one storing its highest value.

In other words, my desired output would look as follows:

a    b     c     d    max_col  max_val
22   18    25    9    c        25
12   24    6     18   b        24
37   8     22    25   a        37
24   19    12    27   d        27

How should I do to retrieve this?

Upvotes: 0

Views: 56

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 101638

Here is another base R option

inds <- max.col(df)
df <- cbind(df,
  max_col = names(df)[inds],
  max_val = df[cbind(seq_along(inds), inds)]
)

which gives

   a  b  c  d max_col max_val
1 22 18 25  9       c      25
2 12 24  6 18       b      24
3 37  8 22 25       a      37
4 24 19 12 27       d      27

Upvotes: 0

akrun
akrun

Reputation: 887193

We can do this in a vectorized efficient way with max.col from base R - gets the position index of the max value in a row, which is used to extract the corresponding column name with [ , and pmax to return the max value per row

mcol <- names(df)[max.col(df, 'first')]
mval <- do.call(pmax, df)
df[c('max_col', 'max_val')] <- list(mcol, mval)

-output

df
#   a  b  c  d max_col max_val
#1 22 18 25  9       c      25
#2 12 24  6 18       b      24
#3 37  8 22 25       a      37
#4 24 19 12 27       d      27

Or using tidyverse, we can use the same max.col and pmax to get the column names of the max value per row and the max value of the row

library(dplyr)
library(purrr)
df %>%
    mutate(max_col = names(cur_data())[max.col(cur_data(), 'first')],
    max_val = invoke(pmax, select(cur_data(), where(is.numeric))))

-output

#     a  b  c  d max_col max_val 
# 1 22 18 25  9       c      25
# 2 12 24  6 18       b      24
# 3 37  8 22 25       a      37
# 4 24 19 12 27       d      27

data

df <- structure(list(a = c(22L, 12L, 37L, 24L), b = c(18L, 24L, 8L, 
19L), c = c(25L, 6L, 22L, 12L), d = c(9L, 18L, 25L, 27L)), 
class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 0

Duck
Duck

Reputation: 39595

It can be also reached reshaping data and merging:

library(tidyverse)
#Code
newdf <- df %>% mutate(id=row_number()) %>%
  left_join(
    df %>% mutate(id=row_number()) %>%
      pivot_longer(-id) %>%
      group_by(id) %>% filter(value==max(value)[1]) %>%
      rename(max_col=name,max_val=value)
  ) %>% select(-id)

Output:

   a  b  c  d max_col max_val
1 22 18 25  9       c      25
2 12 24  6 18       b      24
3 37  8 22 25       a      37
4 24 19 12 27       d      27

Some data used:

#Data
df <- structure(list(a = c(22L, 12L, 37L, 24L), b = c(18L, 24L, 8L, 
19L), c = c(25L, 6L, 22L, 12L), d = c(9L, 18L, 25L, 27L)), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work:

> library(dplyr)
> df %>% rowwise() %>% mutate(max_col = names(df)[which.max(c_across(a:d))], max_val = max(c_across(a:d)))
# A tibble: 4 x 6
# Rowwise: 
      a     b     c     d max_col max_val
  <dbl> <dbl> <dbl> <dbl> <chr>     <dbl>
1    22    18    25     9 c            25
2    12    24     6    18 b            24
3    37     8    22    25 a            37
4    24    19    12    27 d            27
> 

Upvotes: 3

Related Questions