user2502836
user2502836

Reputation: 793

dplyr mutate rowwise max of range of columns

I can use the following to return the maximum of 2 columns

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(Sepal.Width,Petal.Length))

What I want to do is find that maximum across a range of columns so I don't have to name each one like this

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(Sepal.Width:Petal.Length))

Any ideas?

Upvotes: 59

Views: 54919

Answers (9)

GGAnderson
GGAnderson

Reputation: 2210

dplyr now includes the c_across function that works with rowwise() to enable the use of select helpers, like starts_with, ends_with, all_of and where(is.numeric). This makes several broad approaches cleaner to implement in complex data pipelines.

Use a preselected character vector containing column names:

  useCols <- c("Sepal.Width", "Petal.Length")
  newiris<-iris %>%
     rowwise() %>%
     mutate(mak = max(c_across(all_of(useCols))))

Or to select columns programmatically using column names, combine with starts_with, ends_with, contains, matches and num_range:

  newiris<-iris %>%
     rowwise() %>%
     mutate(mak = max(c_across(starts_with("Sepal"))))

Or to select columns based on content, combine with where:

  newiris<-iris %>%
     rowwise() %>%
     mutate(mak = max(c_across(where(~is.numeric(.x) && mean(.x) < 5))))

Upvotes: 0

Julian
Julian

Reputation: 9320

If one wants to use selection helpers like contains(), starts_with() we may use

library(dplyr)
iris |> 
  mutate(max_value = purrr::pmap_dbl(select(iris, contains("petal")), pmax, na.rm=TRUE))

Upvotes: 1

AndreasM
AndreasM

Reputation: 952

Here is a base-R solution: A range of column names can be selected with subset(). The rowwise maximum values can be added with a combination of transform() and apply().

newiris <- transform(iris, mak = apply(subset(iris, select=Sepal.Width:Petal.Length), 1, max))

Upvotes: 0

arho
arho

Reputation: 291

Currently (dplyr 1.0.2), this works:

newiris<-iris %>%
 rowwise() %>%
 mutate(mak=max(c_across(Sepal.Width:Petal.Length)))

this also lets you use selection helpers (starts_with etc).

Upvotes: 19

Richard DiSalvo
Richard DiSalvo

Reputation: 911

One approach is to pipe the data into select then call pmax using a function that makes pmax rowwise (this is very similar to @inscaven's answer that uses do.call, unfortunately there isn't a rowMaxs function in R so we have to use a function to make pmax rowwise -- below I used purrr::pmap)

library(dplyr)
library(purrr)

# to get the value of the max
iris$rowwisemax <- iris %>% select(Sepal.Width:Petal.Length) %>% pmap(pmax) %>% as.numeric

# to get the argmax
iris$whichrowwisemax <- iris %>% select(Sepal.Width:Petal.Length) %>% {names(.)[max.col(.)]}

Upvotes: 5

Ben
Ben

Reputation: 42343

With rlang and quasiquotation we have another dplyr option. First, get the row names that we want to compute the parallel max for:

iris_cols <- iris %>% select(Sepal.Length:Petal.Width) %>% names()

Then we can use !!! and rlang::syms to compute the parallel max for every row of those columns:

iris %>%
  mutate(mak=pmax(!!!rlang::syms(iris_cols)))
  • rlang::syms takes a string input (the column names), and turns it into a symbol
  • !!! unquotes and splices its argument, here the column names

Which gives:

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species mak
1            5.1         3.5          1.4         0.2     setosa 5.1
2            4.9         3.0          1.4         0.2     setosa 4.9
3            4.7         3.2          1.3         0.2     setosa 4.7
4            4.6         3.1          1.5         0.2     setosa 4.6
5            5.0         3.6          1.4         0.2     setosa 5.0

h/t: https://stackoverflow.com/a/47773379/1036500

Upvotes: 29

Scott Olesen
Scott Olesen

Reputation: 31

It seems like @akrun's answer only addresses the cases when you can type in the names of all the variables, whether that's using mutate directly with mutate(pmax_value=pmax(var1, var2)) or when using lazy evaluation with mutate_ and interp via mutate_(interp(~pmax(v1, v2), v1=as.name(var1), v2=as.name(var2)).

I can see two ways to do this if you want to use the colon syntax Sepal.Length:Petal.Width or if you happen to have a vector with the column names.

The first is more elegant. You tidy the data and take the maximum among the values when grouped:

data(iris)
library(dplyr)
library(tidyr)

iris_id = iris %>% mutate(id=1:nrow(.))
iris_id %>%
  gather('attribute', 'value', Sepal.Length:Petal.Width) %>%
  group_by(id) %>%
  summarize(max_attribute=max(value)) %>%
  right_join(iris_id, by='id') %>%
  head(3)
## # A tibble: 3 × 7
##      id max_attribute Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##   <int>         <dbl>        <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
## 1     1           5.1          5.1         3.5          1.4         0.2  setosa
## 2     2           4.9          4.9         3.0          1.4         0.2  setosa
## 3     3           4.7          4.7         3.2          1.3         0.2  setosa

The harder way is to use an interpolated formula. This is good if you have a character vector with the names of the variables to be max'ed over or if you the table is too tall/wide for it to be tidied.

# Make a character vector of the names of the columns we want to take the
# maximum over
target_columns = iris %>% select(-Species) %>% names
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

# Make a vector of dummy variables that will take the place of the real
# column names inside the interpolated formula
dummy_vars = sapply(1:length(target_columns), function(i) sprintf('x%i', i))
## [1] "x1" "x2" "x3" "x4"

# Paste those variables together to make the argument of the pmax in the
# interpolated formula
dummy_vars_string = paste0(dummy_vars, collapse=',')
## [1] "x1,x2,x3,x4"

# Make a named list that maps the dummy variable names (e.g., x1) to the
# real variable names (e.g., Sepal.Length)
dummy_vars_list = lapply(target_columns, as.name) %>% setNames(dummy_vars)
## $x1
## Sepal.Length
##
## $x2
## Sepal.Width
## 
## $x3
## Petal.Length
##
## $x4
## Petal.Width

# Make a pmax formula using the dummy variables
max_formula = as.formula(paste0(c('~pmax(', dummy_vars_string, ')'), collapse=''))
## ~pmax(x1, x2, x3, x4)

# Interpolate the formula using the named variables
library(lazyeval)
iris %>%
  mutate_(max_attribute=interp(max_formula, .values=dummy_vars_list)) %>%
  head(3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species max_attribute
## 1          5.1         3.5          1.4         0.2  setosa           5.1
## 2          4.9         3.0          1.4         0.2  setosa           4.9
## 3          4.7         3.2          1.3         0.2  setosa           4.7

Upvotes: 3

inscaven
inscaven

Reputation: 2584

For selecting some columns without typing whole names when using dplyr I prefer select parameter from subset function.

You can get desired result like this:

iris %>% subset(select = 2:4) %>% mutate(mak = do.call(pmax, (.))) %>%
  select(mak) %>% cbind(iris)

Upvotes: 7

akrun
akrun

Reputation: 887891

Instead of rowwise(), this can be done with pmax

iris %>%
      mutate(mak=pmax(Sepal.Width,Petal.Length, Petal.Width))

May be we can use interp from library(lazyeval) if we want to reference the column names stored in a vector.

library(lazyeval)
nm1 <- names(iris)[2:4]
iris %>% 
     mutate_(mak= interp(~pmax(v1), v1= as.name(nm1)))

Upvotes: 64

Related Questions