cephalopod
cephalopod

Reputation: 1906

how to filter for standard deviation within a pipeine

Is there an efficient way to filter out numbers say under 2.5 std. deviation within a pipeline? I currently, calculate the std. dev value outside the pipe and then filter using this in the pipe. I'm sure there must be a more efficient way to accomplish this.

set.seed(125)
nd <- data.frame( x = rnorm(1000, 3, .1))

My current method

sdx <- sd(nd$x) * 2.5 + mean(nd$x)
sdx1 <- sd(nd$x) * -2.5 + mean(nd$x)


library(tidyverse)
nd %>% filter(x < sdx, x > sdx1) %>% .$x %>% hist

Upvotes: 3

Views: 1765

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146050

You can rearrange your equation with an abs() to simplify and only use sd() once:

... %>%
  filter(abs(x - mean(x)) < 2.5 * sd(x))

# or use the built-in `scale()` function
... %>% 
  filter(abs(scale(x)) < 2.5)

# or, as in comments, use between
... %>%
  filter(between(x, x - 2.5 * sd(x), x + 2.5 * sd(x)))

# or some between instead of abs()
... %>% 
  filter(between(scale(x), -2.5, 2.5))

Upvotes: 2

Related Questions