murali
murali

Reputation: 41

how to get max from multiple columns in R

INPUT:-  
 year max1 max2 max3  
2001   10  101   87  
2002  103   19   88   
2003   21   23   89    
2004   27   28   91  
  

OUTPUT:-

YEAR    MAX

2001     101

2002     103          

2003      89

2004      91   

Upvotes: 1

Views: 1215

Answers (4)

Dan Adams
Dan Adams

Reputation: 5204

Here's a slightly different option using dplyr::c_across() which gives you handy access to tidyselect semantics

library(tidyverse)

d <- structure(list(year = 2001:2004, max1 = c(10L, 103L, 21L, 27L), max2 = c(101L, 19L, 23L, 28L), max3 = c(87L, 88L, 89L, 91L)), class = "data.frame", row.names = c(NA, -4L))

d %>% 
  rowwise() %>% 
  mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>% 
  ungroup()
#> # A tibble: 4 x 2
#>    year   max
#>   <int> <int>
#> 1  2001   101
#> 2  2002   103
#> 3  2003    89
#> 4  2004    91

Benchmarking

For what it's worth there are some performance differences which would probably only matter if your dataset is very large but worth noting. The solution from @Gregor Thomas is by far the fastest.

library(microbenchmark)
microbenchmark(
# Dan Adams
c_across = d %>% 
  rowwise() %>% 
  mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>% 
  ungroup(),

# MonJeanJean
max_all = d %>% 
  rowwise() %>% 
  mutate(MAX = max(max1, max2, max3)) %>% 
  select(year, MAX),

# Greg
do.call = d %>%
  transmute(YEAR = year, MAX = do.call(pmax, unname(across(starts_with(
    "max"
  ))))),

# Gregor Thomas
apply = data.frame(
  year = d$year,
  max = apply(d[-1], 1, max))
)
#> Unit: microseconds
#>      expr    min      lq      mean   median       uq     max neval  cld
#>  c_across 6928.9 8123.20  9548.682  9187.00 10709.95 17595.6   100   c 
#>   max_all 7890.5 9016.35 10473.327 10176.90 11366.65 16389.5   100    d
#>   do.call 3392.3 3976.20  4609.419  4473.55  4981.30  9282.5   100  b  
#>     apply  349.1  470.20   567.896   535.05   670.70  1017.7   100 a

Created on 2022-02-08 by the reprex package (v2.0.1)

Upvotes: 2

Greg
Greg

Reputation: 3336

A vectorized dplyr solution with pmax(), which facilitates tidyselection while avoiding rowwise() inefficiency:

Solution

Once you've set everything up

library(dplyr)

# ...
# Code to generate 'your_data'.
# ...

here is your solution for the columns {max1, max2, max3}

your_data %>% transmute(YEAR = year, MAX = pmax(max1, max2, max3))

and more generally for all columns of the form max*:

your_data %>% transmute(YEAR = year, MAX = do.call(pmax, unname(across(
  # Tidy selection of 'max*' columns:
  starts_with("max")
))))

At your discretion, you can replace starts_with() with another selection helper like matches("^max\\d+$").

Results

Given your_data reproduced here

your_data <- structure(
  list(
    year = c(2001, 2002, 2003, 2004),
    max1 = c(10, 103, 21, 27),
    max2 = c(101, 19, 23, 28),
    max3 = c(87, 88, 89, 91)
  ),
  row.names = c(NA, -4L),
  class = "data.frame"
)

this tidy workflow should yield the following data.frame:

  YEAR MAX
1 2001 101
2 2002 103
3 2003  89
4 2004  91

Upvotes: 3

MonJeanJean
MonJeanJean

Reputation: 2906

A dplyr solution:

Data:

df <- fread(" year max1 max2 max3  
2001   10  101   87  
2002  103   19   88   
2003   21   23   89    
2004   27   28   91 ")

Code:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(MAX = max(max1, max2, max3)) %>% 
  select(year, MAX)

Output:

# A tibble: 4 x 2
# Rowwise: 
   year   MAX
  <int> <int>
1  2001   101
2  2002   103
3  2003    89
4  2004    91

Upvotes: 3

Gregor Thomas
Gregor Thomas

Reputation: 145755

This should do it:

OUTPUT = data.frame(
  YEAR = INPUT$year,
  MAX = apply(INPUT[-1], 1, max)
)

Upvotes: 2

Related Questions