udden2903
udden2903

Reputation: 783

Using mutate_at and which.max to operate on selected columns of a data frame

I am trying to use a combination of mutate_at and which.max to manipulate a data frame as outlined below.

#This is basically what I want to achieve
df_want <- iris %>% group_by(Species) %>% mutate(Sepal.Length = Sepal.Length[which.max(Petal.Width)],
                                      Sepal.Width = Sepal.Width[which.max(Petal.Width)])

#Here is my attempt at a smarter solution, but it does not work
df_attempt <- iris %>% group_by(Species) %>% mutate_at(c("Sepal.Length", "Sepal.Width"), function(x) x[which.max("Petal.Width")])

#However, this works
df_test <- iris %>% group_by(Species) %>% mutate_at(c("Sepal.Length", "Sepal.Width"), function(x) x + 100)

The code to produce df_attempt does not work. I get the following error message:

Error in mutate_impl(.data, dots) : 
  Column `Sepal.Length` must be length 50 (the group size) or one, not 0

Any ideas how I can get around this while still using mutate_at?

Upvotes: 1

Views: 412

Answers (1)

acylam
acylam

Reputation: 18661

The standard dplyr way would be:

df_want <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = Sepal.Length[which.max(Petal.Width)],
         Sepal.Width = Sepal.Width[which.max(Petal.Width)])

df_attempt <- iris %>% 
  group_by(Species) %>% 
  mutate_at(vars(Sepal.Length, Sepal.Width), funs(.[which.max(Petal.Width)]))

Result:

# A tibble: 150 x 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
 1            5         3.5          1.4         0.2  setosa
 2            5         3.5          1.4         0.2  setosa
 3            5         3.5          1.3         0.2  setosa
 4            5         3.5          1.5         0.2  setosa
 5            5         3.5          1.4         0.2  setosa
 6            5         3.5          1.7         0.4  setosa
 7            5         3.5          1.4         0.3  setosa
 8            5         3.5          1.5         0.2  setosa
 9            5         3.5          1.4         0.2  setosa
10            5         3.5          1.5         0.1  setosa
# ... with 140 more rows

> identical(df_want, df_attempt)
[1] TRUE

Note:

  1. With vars you can reference variables with NSE.

  2. With funs you can reference each column with a ., which is equivalent to function(x) x

Upvotes: 2

Related Questions