Reputation: 41
INPUT:-
year max1 max2 max3
2001 10 101 87
2002 103 19 88
2003 21 23 89
2004 27 28 91
OUTPUT:-
YEAR MAX
2001 101
2002 103
2003 89
2004 91
Upvotes: 1
Views: 1215
Reputation: 5204
Here's a slightly different option using dplyr::c_across()
which gives you handy access to tidyselect semantics
library(tidyverse)
d <- structure(list(year = 2001:2004, max1 = c(10L, 103L, 21L, 27L), max2 = c(101L, 19L, 23L, 28L), max3 = c(87L, 88L, 89L, 91L)), class = "data.frame", row.names = c(NA, -4L))
d %>%
rowwise() %>%
mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>%
ungroup()
#> # A tibble: 4 x 2
#> year max
#> <int> <int>
#> 1 2001 101
#> 2 2002 103
#> 3 2003 89
#> 4 2004 91
For what it's worth there are some performance differences which would probably only matter if your dataset is very large but worth noting. The solution from @Gregor Thomas is by far the fastest.
library(microbenchmark)
microbenchmark(
# Dan Adams
c_across = d %>%
rowwise() %>%
mutate(max = max(c_across(starts_with("max"))), .keep = "unused") %>%
ungroup(),
# MonJeanJean
max_all = d %>%
rowwise() %>%
mutate(MAX = max(max1, max2, max3)) %>%
select(year, MAX),
# Greg
do.call = d %>%
transmute(YEAR = year, MAX = do.call(pmax, unname(across(starts_with(
"max"
))))),
# Gregor Thomas
apply = data.frame(
year = d$year,
max = apply(d[-1], 1, max))
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> c_across 6928.9 8123.20 9548.682 9187.00 10709.95 17595.6 100 c
#> max_all 7890.5 9016.35 10473.327 10176.90 11366.65 16389.5 100 d
#> do.call 3392.3 3976.20 4609.419 4473.55 4981.30 9282.5 100 b
#> apply 349.1 470.20 567.896 535.05 670.70 1017.7 100 a
Created on 2022-02-08 by the reprex package (v2.0.1)
Upvotes: 2
Reputation: 3336
A vectorized dplyr
solution with pmax()
, which facilitates tidyselect
ion while avoiding rowwise()
inefficiency:
Once you've set everything up
library(dplyr)
# ...
# Code to generate 'your_data'.
# ...
here is your solution for the columns {max1
, max2
, max3
}
your_data %>% transmute(YEAR = year, MAX = pmax(max1, max2, max3))
and more generally for all columns of the form max*
:
your_data %>% transmute(YEAR = year, MAX = do.call(pmax, unname(across(
# Tidy selection of 'max*' columns:
starts_with("max")
))))
At your discretion, you can replace starts_with()
with another selection helper like matches("^max\\d+$")
.
Given your_data
reproduced here
your_data <- structure(
list(
year = c(2001, 2002, 2003, 2004),
max1 = c(10, 103, 21, 27),
max2 = c(101, 19, 23, 28),
max3 = c(87, 88, 89, 91)
),
row.names = c(NA, -4L),
class = "data.frame"
)
this tidy workflow should yield the following data.frame
:
YEAR MAX
1 2001 101
2 2002 103
3 2003 89
4 2004 91
Upvotes: 3
Reputation: 2906
A dplyr
solution:
Data:
df <- fread(" year max1 max2 max3
2001 10 101 87
2002 103 19 88
2003 21 23 89
2004 27 28 91 ")
Code:
library(dplyr)
df %>%
rowwise() %>%
mutate(MAX = max(max1, max2, max3)) %>%
select(year, MAX)
Output:
# A tibble: 4 x 2
# Rowwise:
year MAX
<int> <int>
1 2001 101
2 2002 103
3 2003 89
4 2004 91
Upvotes: 3
Reputation: 145755
This should do it:
OUTPUT = data.frame(
YEAR = INPUT$year,
MAX = apply(INPUT[-1], 1, max)
)
Upvotes: 2