Reputation: 896
I'm trying to round numbers in multiple columns using different thresholds based on the value. Specifically, I want to round to an integer if the absolute value is larger than 1 and round to the third decimal point if not. I've tried a few different strategies by following answers to similar questions but doesn't seem to work. Here's a reproducible example.
df <- structure(list(dep = c("cyl", "cyl", "disp", "disp", "drat",
"drat", "hp", "hp", "mpg", "mpg"), name = c("estimate", "t_stat",
"estimate", "t_stat", "estimate", "t_stat", "estimate", "t_stat",
"estimate", "t_stat"), dat1 = c(1.15052520023357, 6.68591106097725,
102.901631449292, 12.1072688820387, -0.422439347353398, -5.23657414425551,
37.5762984208224, 5.06741973124599, -5.05739510901596, -8.18496613472796
), dat2 = c(1.27442224382304, 8.42316433209027, 106.428896001266,
12.147509560065, -0.393755429958381, -5.30373672190043, 38.64345279421,
6.17204732384094, -4.84272702226804, -10.6216411092441), dat3 = c(1.07794895749739,
5.1912094236003, 103.687423254053, 7.78976856569243, -0.19357672324514,
-2.62921011406252, 36.7770360009548, 4.84248650357675, -4.53918562415258,
-7.91010248086649)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
According to the criteria, every number in column dat1
to dat3
should become an integer except for values in the fifth row. I've tried the following two approaches, but couldn't get it done.
df_raw %>% mutate_if( is.numeric(.) == T & abs(.) > 10, round, 0)
Error in Math.data.frame(.) :
non-numeric variable(s) in data frame: dep, name
In the second approach, everything seems to work, but the fifth row is also rounded to 0 digits.
> df_raw %>% mutate_if( ~ is.numeric(.) == T && abs(.) > 1, round, 0)
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <dbl> <dbl> <dbl>
1 cyl estimate 1 1 1
2 cyl t_stat 7 8 5
3 disp estimate 103 106 104
4 disp t_stat 12 12 8
5 drat estimate 0 0 0
6 drat t_stat -5 -5 -3
7 hp estimate 38 39 37
8 hp t_stat 5 6 5
9 mpg estimate -5 -5 -5
10 mpg t_stat -8 -11 -8
My real problem involves many columns to mutate, so combining round
with mutate_if
(or something similar) is strongly preferred. Thanks!
Upvotes: 0
Views: 1846
Reputation: 189
A correct syntax would be :
df_raw %>%
mutate_if(
is.numeric,
~ ifelse(abs(.x) > 1, round(.x), round(.x, 3))
)
(second argument of mutate_if is a function, is.numeric
here)
Upvotes: 1
Reputation: 851
Try the case_when
function from the dplyr
package for complex condition handling:
library(dplyr)
df %>%
mutate_at(.vars = vars(dat1, dat2, dat3),
.funs = ~ case_when(abs(.x) > 1 ~ round(.x, digits = 0),
TRUE ~ round(.x, digits = 3)))
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <dbl> <dbl> <dbl>
1 cyl estimate 1 1 1
2 cyl t_stat 7 8 5
3 disp estimate 103 106 104
4 disp t_stat 12 12 8
5 drat estimate -0.422 -0.394 -0.194
6 drat t_stat -5 -5 -3
7 hp estimate 38 39 37
8 hp t_stat 5 6 5
9 mpg estimate -5 -5 -5
10 mpg t_stat -8 -11 -8
What we do here is that we mutate_at
all three variables dat1
to dat3
(specified in the .vars
argument) and call the case_when
as a quosure style lambda function. This rounds each value to an integer (i.e., digits = 0
) if the absolute value is larger than 1 and to a three-digit decimal float otherwise.
Side note:
While this approach is somewhat more verbose, it allows you to flexibly adjust both the variables which you want to apply the function to and add more complex conditions. If you are sure that you really only want to apply the function to numeric variables, you can of course use the mutate_if
combined with a is.numeric
predicate, but keep the case_when
for the condition handling part:
df %>%
mutate_if(.predicate = is.numeric,
.funs = ~ case_when(abs(.x) > 1 ~ round(.x, digits = 0),
TRUE ~ round(.x, digits = 3)))
Upvotes: 1