John
John

Reputation: 1947

How to change two specific groups in data frame in ONE TIME

Let's consider data frame with some observations.

we call outlier an observation that match

enter image description here

I want to change the "big outliers" with

enter image description here

and "small outliers" with

enter image description here

My work so far

Let's take some random data:

set.seed(32)
df1<-data.frame(c(rnorm(20),-100),c(runif(20),-5),c(rexp(20),7))
#Contains one when we have big outlier and 0 if not
big_outlier_frame<-(scale(df1)>3)*1
#Contains one when we have small outlier and 0 if not
small_outlier_frame<-(scale(df1)<(-3))*1

My idea was to change all big outliers to NA's and then make a replacement.

df1[big_outlier_frame==1]<-NA
df1

library(dplyr)
df1 %>% 
  mutate(across(everything(),  function(x)  ifelse(!is.na(x), x,
                                                       2 * sd(x, na.rm = TRUE) + mean(x, na.rm = TRUE))))

After that I wanted to do analogous way of thinking for small outliers but then I found the problem. Mean and standard deviation will change after replacing big outliers! So what I have to do is to change both - small and big ones at the same time, but I have no idea how it can be done. Could you give me a hand ?

Upvotes: 0

Views: 34

Answers (1)

DS_UNI
DS_UNI

Reputation: 2650

Does this return what you have in mind :

library(dplyr)
df1 %>% mutate_all(
  function(x) ifelse(scale(x) < -3, mean(x) - 3*sd(x), 
                     ifelse(scale(x) > 3, mean(x) + 3*sd(x), x)))

Upvotes: 1

Related Questions