How to change two specific groups in data frame in ONE TIME

Question

Let's consider data frame with some observations.

we call outlier an observation that match

I want to change the "big outliers" with

and "small outliers" with

My work so far

Let's take some random data:

set.seed(32)
df1<-data.frame(c(rnorm(20),-100),c(runif(20),-5),c(rexp(20),7))
#Contains one when we have big outlier and 0 if not
big_outlier_frame<-(scale(df1)>3)*1
#Contains one when we have small outlier and 0 if not
small_outlier_frame<-(scale(df1)<(-3))*1

My idea was to change all big outliers to NA's and then make a replacement.

df1[big_outlier_frame==1]<-NA
df1

library(dplyr)
df1 %>% 
  mutate(across(everything(),  function(x)  ifelse(!is.na(x), x,
                                                       2 * sd(x, na.rm = TRUE) + mean(x, na.rm = TRUE))))

After that I wanted to do analogous way of thinking for small outliers but then I found the problem. Mean and standard deviation will change after replacing big outliers! So what I have to do is to change both - small and big ones at the same time, but I have no idea how it can be done. Could you give me a hand ?

DS_UNI · Accepted Answer

Does this return what you have in mind :

library(dplyr)
df1 %>% mutate_all(
  function(x) ifelse(scale(x) < -3, mean(x) - 3*sd(x), 
                     ifelse(scale(x) > 3, mean(x) + 3*sd(x), x)))

How to change two specific groups in data frame in ONE TIME

Answers (1)

Related Questions