Reputation: 343
Say if we have a dataframe looking like this below:
a b c d
22 18 25 9
12 24 6 18
37 8 22 25
24 19 12 27
I would like to create two new columns out of these ones: a) One column storing the name of the column in which each row gets its highest value. b) Another one storing its highest value.
In other words, my desired output would look as follows:
a b c d max_col max_val
22 18 25 9 c 25
12 24 6 18 b 24
37 8 22 25 a 37
24 19 12 27 d 27
How should I do to retrieve this?
Upvotes: 0
Views: 56
Reputation: 101638
Here is another base R option
inds <- max.col(df)
df <- cbind(df,
max_col = names(df)[inds],
max_val = df[cbind(seq_along(inds), inds)]
)
which gives
a b c d max_col max_val
1 22 18 25 9 c 25
2 12 24 6 18 b 24
3 37 8 22 25 a 37
4 24 19 12 27 d 27
Upvotes: 0
Reputation: 887193
We can do this in a vectorized efficient way with max.col
from base R
- gets the position index of the max value in a row, which is used to extract the corresponding column name with [
, and pmax
to return the max
value per row
mcol <- names(df)[max.col(df, 'first')]
mval <- do.call(pmax, df)
df[c('max_col', 'max_val')] <- list(mcol, mval)
-output
df
# a b c d max_col max_val
#1 22 18 25 9 c 25
#2 12 24 6 18 b 24
#3 37 8 22 25 a 37
#4 24 19 12 27 d 27
Or using tidyverse
, we can use the same max.col
and pmax
to get the column names of the max value per row and the max value of the row
library(dplyr)
library(purrr)
df %>%
mutate(max_col = names(cur_data())[max.col(cur_data(), 'first')],
max_val = invoke(pmax, select(cur_data(), where(is.numeric))))
-output
# a b c d max_col max_val
# 1 22 18 25 9 c 25
# 2 12 24 6 18 b 24
# 3 37 8 22 25 a 37
# 4 24 19 12 27 d 27
df <- structure(list(a = c(22L, 12L, 37L, 24L), b = c(18L, 24L, 8L,
19L), c = c(25L, 6L, 22L, 12L), d = c(9L, 18L, 25L, 27L)),
class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 0
Reputation: 39595
It can be also reached reshaping data and merging:
library(tidyverse)
#Code
newdf <- df %>% mutate(id=row_number()) %>%
left_join(
df %>% mutate(id=row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>% filter(value==max(value)[1]) %>%
rename(max_col=name,max_val=value)
) %>% select(-id)
Output:
a b c d max_col max_val
1 22 18 25 9 c 25
2 12 24 6 18 b 24
3 37 8 22 25 a 37
4 24 19 12 27 d 27
Some data used:
#Data
df <- structure(list(a = c(22L, 12L, 37L, 24L), b = c(18L, 24L, 8L,
19L), c = c(25L, 6L, 22L, 12L), d = c(9L, 18L, 25L, 27L)), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 11584
Does this work:
> library(dplyr)
> df %>% rowwise() %>% mutate(max_col = names(df)[which.max(c_across(a:d))], max_val = max(c_across(a:d)))
# A tibble: 4 x 6
# Rowwise:
a b c d max_col max_val
<dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 22 18 25 9 c 25
2 12 24 6 18 b 24
3 37 8 22 25 a 37
4 24 19 12 27 d 27
>
Upvotes: 3