Reputation: 1166
I thinks that is easier if i show, what is the problem, So I have this numeric data
MoSold YrSold SalePrice OverallQual OverallCond
1 2 3 208500 7 5
2 5 2 181500 6 8
3 9 3 223500 7 5
4 2 1 140000 7 5
5 12 3 250000 8 5
6 10 4 143000 5 5
thanks to mutate_at and a nested ifelse i would like to change each row if condition is true (the column mean is higher than 0.8) however when i try to do it whith this code
data %>%
mutate_at(vars(MoSold, YrSold, SalePrice, OverallQual, OverallCond),
~(ifelse((mean(., na.rm = T)) > 4, log(.), .))) %>% head()
I get the following data, all the columns have the same value
MoSold YrSold SalePrice OverallQual OverallCond
1 0.6931472 3 12.24769 1.94591 1.609438
2 0.6931472 3 12.24769 1.94591 1.609438
3 0.6931472 3 12.24769 1.94591 1.609438
4 0.6931472 3 12.24769 1.94591 1.609438
5 0.6931472 3 12.24769 1.94591 1.609438
6 0.6931472 3 12.24769 1.94591 1.609438
and i would like to have the log of the corresponing value for each row if the condition is true and the raw value if condition is false
I know that one solution is to use a for loop but a would really like a solution with dplyr/tidyverse
Thanks in advance
I.
Upvotes: 4
Views: 3298
Reputation: 887088
The issue is related to mean
used as the test
for ifelse
, which is a single value while the 'yes', 'no' arguments are of different length, i.e. the logical expression with that result in a single TRUE/FALSE and this gets replicated for the full length with recycling the first element of 'yes', 'no'
Here, we can use if/else
instead of ifelse
library(dplyr)
data %>%
mutate_all(~ if(mean(., na.rm = TRUE) > 4) log(.) else .)
In the dplyr
1.0.0, an option is mutate/across
data %>%
mutate(across(everything(),
~ if(mean(., na.rm = TRUE) > 4) log(.) else .))
# MoSold YrSold SalePrice OverallQual OverallCond
#1 0.6931472 3 12.24769 1.945910 1.609438
#2 1.6094379 2 12.10901 1.791759 2.079442
#3 2.1972246 3 12.31717 1.945910 1.609438
#4 0.6931472 1 11.84940 1.945910 1.609438
#5 2.4849066 3 12.42922 2.079442 1.609438
#6 2.3025851 4 11.87060 1.609438 1.609438
If we want to use ifelse
, replicate the single logical value to make all the 'test', 'yes', 'no' of same length
data %>%
mutate_at(vars(MoSold, YrSold, SalePrice, OverallQual, OverallCond),
~(ifelse(rep((mean(., na.rm = T)) > 4, n()), log(.), .)))
# MoSold YrSold SalePrice OverallQual OverallCond
#1 0.6931472 3 12.24769 1.945910 1.609438
#2 1.6094379 2 12.10901 1.791759 2.079442
#3 2.1972246 3 12.31717 1.945910 1.609438
#4 0.6931472 1 11.84940 1.945910 1.609438
#5 2.4849066 3 12.42922 2.079442 1.609438
#6 2.3025851 4 11.87060 1.609438 1.609438
data <- structure(list(MoSold = c(2L, 5L, 9L, 2L, 12L, 10L), YrSold = c(3L,
2L, 3L, 1L, 3L, 4L), SalePrice = c(208500L, 181500L, 223500L,
140000L, 250000L, 143000L), OverallQual = c(7L, 6L, 7L, 7L, 8L,
5L), OverallCond = c(5L, 8L, 5L, 5L, 5L, 5L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6"))
Upvotes: 5