Reputation: 1906
Is there an efficient way to filter out numbers say under 2.5 std. deviation within a pipeline? I currently, calculate the std. dev value outside the pipe and then filter using this in the pipe. I'm sure there must be a more efficient way to accomplish this.
set.seed(125)
nd <- data.frame( x = rnorm(1000, 3, .1))
My current method
sdx <- sd(nd$x) * 2.5 + mean(nd$x)
sdx1 <- sd(nd$x) * -2.5 + mean(nd$x)
library(tidyverse)
nd %>% filter(x < sdx, x > sdx1) %>% .$x %>% hist
Upvotes: 3
Views: 1765
Reputation: 146050
You can rearrange your equation with an abs()
to simplify and only use sd()
once:
... %>%
filter(abs(x - mean(x)) < 2.5 * sd(x))
# or use the built-in `scale()` function
... %>%
filter(abs(scale(x)) < 2.5)
# or, as in comments, use between
... %>%
filter(between(x, x - 2.5 * sd(x), x + 2.5 * sd(x)))
# or some between instead of abs()
... %>%
filter(between(scale(x), -2.5, 2.5))
Upvotes: 2