Data Mastery
Data Mastery

Reputation: 2085

mutate_if - warning message

Hello everybody,

library(dplyr)
library(tibble)
mtcars %>%
  rownames_to_column("modelle") %>%
  mutate_if(~is.numeric(.x) & mean(.x) > 50, ~(.x / 1000))

Warning message:
In mean.default(.x) : argument is not numeric or logical: returning NA

This error seems to be because of the character vector. It works, but it´s still very ugly. Did I do anything wrong and what can be done better in that case?

Thank you!

Upvotes: 2

Views: 171

Answers (2)

r2evans
r2evans

Reputation: 160447

R does not short-circuit vectorized &, so this is running both is.numeric and mean on all columns. Since your first column (modelle) is obviously character, it is failing.

You actually don't need it to be vectorized, however. If you change from the vectorized & to the binary &&, R short-circuits it and you get the behavior you want.

mtcars %>%
  rownames_to_column("modelle") %>%
  mutate_if(~is.numeric(.x) && mean(.x) > 50, ~(.x / 1000)) %>%
  head()
#             modelle  mpg cyl  disp    hp drat    wt  qsec vs am gear carb
# 1         Mazda RX4 21.0   6 0.160 0.110 3.90 2.620 16.46  0  1    4    4
# 2     Mazda RX4 Wag 21.0   6 0.160 0.110 3.90 2.875 17.02  0  1    4    4
# 3        Datsun 710 22.8   4 0.108 0.093 3.85 2.320 18.61  1  1    4    1
# 4    Hornet 4 Drive 21.4   6 0.258 0.110 3.08 3.215 19.44  1  0    3    1
# 5 Hornet Sportabout 18.7   8 0.360 0.175 3.15 3.440 17.02  0  0    3    2
# 6           Valiant 18.1   6 0.225 0.105 2.76 3.460 20.22  1  0    3    1

Further demonstration that & is not short-circuiting.

mymean <- function(x, ...) {
  if (is.character(x)) {
    message("character?")
    return(Inf) # this is certainly not the right thing to do in general ...
  } else mean(x, ...)
}
mtcars %>%
  rownames_to_column("modelle") %>%
  mutate_if(~is.numeric(.x) & mymean(.x) > 50, ~(.x / 1000)) %>%
  head()
# character?
#             modelle  mpg cyl  disp    hp drat    wt  qsec vs am gear carb
# 1         Mazda RX4 21.0   6 0.160 0.110 3.90 2.620 16.46  0  1    4    4
# 2     Mazda RX4 Wag 21.0   6 0.160 0.110 3.90 2.875 17.02  0  1    4    4
# 3        Datsun 710 22.8   4 0.108 0.093 3.85 2.320 18.61  1  1    4    1
# 4    Hornet 4 Drive 21.4   6 0.258 0.110 3.08 3.215 19.44  1  0    3    1
# 5 Hornet Sportabout 18.7   8 0.360 0.175 3.15 3.440 17.02  0  0    3    2
# 6           Valiant 18.1   6 0.225 0.105 2.76 3.460 20.22  1  0    3    1

If short-circuiting were taking place, then mymean would never get to the message. (I don't think this mymean is a viable replacement here, for a couple of reasons: (1) the use of Inf was solely to ensure the condition outside of the call to mean worked, but if an error/warning occurs and a numeric is expected, then one should typically return NA or NaN, not a number ... even if you might not consider Inf a real usable number. (2) It is addressing a symptom, not the problem. The problem is the absence of short-circuiting in vectorized logical expressions.)

Upvotes: 4

H&#233;ctor Garrido
H&#233;ctor Garrido

Reputation: 185

You should use "&&" instead of "&". The first is used for scalars and the second, for vectors. In your case, the average is a scalar.

library(dplyr)
library(tibble)
mtcars %>%
rownames_to_column("modelle") %>%
mutate_if(~is.numeric(.x) && mean(.x) > 50, ~(.x / 1000))

Upvotes: 1

Related Questions