Reputation: 1947
Let's consider data frame with some observations.
we call outlier an observation that match
I want to change the "big outliers" with
and "small outliers" with
My work so far
Let's take some random data:
set.seed(32)
df1<-data.frame(c(rnorm(20),-100),c(runif(20),-5),c(rexp(20),7))
#Contains one when we have big outlier and 0 if not
big_outlier_frame<-(scale(df1)>3)*1
#Contains one when we have small outlier and 0 if not
small_outlier_frame<-(scale(df1)<(-3))*1
My idea was to change all big outliers to NA's and then make a replacement.
df1[big_outlier_frame==1]<-NA
df1
library(dplyr)
df1 %>%
mutate(across(everything(), function(x) ifelse(!is.na(x), x,
2 * sd(x, na.rm = TRUE) + mean(x, na.rm = TRUE))))
After that I wanted to do analogous way of thinking for small outliers but then I found the problem. Mean and standard deviation will change after replacing big outliers! So what I have to do is to change both - small and big ones at the same time, but I have no idea how it can be done. Could you give me a hand ?
Upvotes: 0
Views: 34
Reputation: 2650
Does this return what you have in mind :
library(dplyr)
df1 %>% mutate_all(
function(x) ifelse(scale(x) < -3, mean(x) - 3*sd(x),
ifelse(scale(x) > 3, mean(x) + 3*sd(x), x)))
Upvotes: 1