mina
mina

Reputation: 195

Replacing values in df using index

I am trying to detect outliers in my dataframe and replace the outliers by NAs. I have slighty modified the function provided in here: How to repeat the Grubbs test and flag the outliers. When trying the function for a vector it works great, but my problem is when I use it on a dataframe. The function detects outliers but I do not know how to get the results as dataframe.

What I want as a result is my original dataframe replaced by NAs. Where NAwill be the detected outliers.

This is what I have tried until now:

library(outliers)
data("rock")

# Function to detect outliers with Grubbs test in a vector
grubbs.flag <- function(vector) {
outliers <- NULL
test <- vector
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
# throw an error if there are too few values for the Grubb's test
 if (length(test) < 3 ) stop("Grubb's test requires > 2 input values")
 while(pv < 0.05) {
outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
test <- vector[!vector %in% outliers]
# stop if all but two values are flagged as outliers
if (length(test) < 3 ) {
  warning("All but two values flagged as outliers")
  break
}
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
idx.outlier <- which(vector %in% outliers)
na.vect <- replace(vector, idx.outlier, NA)

}
return(na.vect)
}

# Function to detect outliers with Grubbs test in a dataframe
Grubbs.df <- function(data){
grubbs.data <- (as.vector(unlist(apply(data, grubbs.flag))))
return(grubbs.data)
}

Any idea how to make this work?

Upvotes: 1

Views: 168

Answers (1)

Ansjovis86
Ansjovis86

Reputation: 1555

You should add this before the while loop:

na.vect <- test

Because if it breaks beforehand, your na.vect won't exist and will thus throw an error. And then just run it on your dataframe like this:

apply(rock,2,grubbs.flag)

The second argument 2 tells to apply it to the columns of the dataframe. Use 1 for rows.

Upvotes: 4

Related Questions