Reputation: 53
I have a lot of csv files of temperature data which I am importing into R to process. These files look like:
ID Date.Time temp1 temp2
1 08/13/17 14:48:18 15.581 -0.423
2 08/13/17 16:48:18 17.510 -0.423
3 08/13/17 18:48:18 15.390 -0.423
Sometimes the temperature readings in columns 3 and 4 are clearly wrong and have to be replaced with NA values. I know that anything over 50 or under -50 is an error. I'd like to just remove these right away. Using
df[,c(3,4)]<- replace(df[,c(3,4)], df[,c(3,4)] >50, NA)
df[,c(3,4)] <- replace(df[,c(3,4)], df[,c(3,4)] < -50, NA)
works but I don't really want to have to repeat this for every file because it seems messy.
I would like to make a function to replace all this like:
df<-remove.errors(df[,c(3,4)])
I've tried:
remove.errors<-function (df) {
df[,]<- replace(df[,], df[,] > 50, NA)
df[,]<- replace(df[,], df[,] < -50, NA)
}
df<-remove.errors(df[,c(3,4)])
This works but unfortunately only keeps the 3rd and 4th columns and the first two disappear. I've played around with this code for far too long and tried some other things which didn't work at all.
I know I'm probably missing something basic. Anyone have any tips on making a function which will replace values in columns 3 and 4 with NAs without changing the first two columns?
Upvotes: 3
Views: 677
Reputation: 269644
1) Try this. It uses only base R.
clean <- function(x, max = 50, min = -max) replace(x, x > max | x < min, NA)
df[3:4] <- clean(df[3:4])
1a) Alternately we could do this (which does not overwrite df
):
transform(df, temp1 = clean(temp1), temp2 = clean(temp2))
2) Adding in magrittr we could do this:
library(magrittr)
df[3:4] %<>% { clean(.) }
3) In dplyr we could do this:
library(dplyr)
df %>% mutate_at(3:4, clean)
Upvotes: 3
Reputation: 50678
You need to return df
in remove.errors
; you can also write the replace
statement more succinctly using abs
:
remove.errors<-function (df) {
df[]<- replace(df, abs(df) > 50, NA)
return(df)
}
Or a cleaner/safer alternative using dplyr
that takes care of numeric
/non-numeric
columns
library(dplyr)
df %>% mutate_if(is.numeric, funs(replace(., abs(.) > 50, NA)))
Upvotes: 2
Reputation: 6073
In case you have non-numeric columns in your data.frame, you might want this:
remove_errors <- function(df) {
numcols <- sapply(df, is.numeric)
df[ , numcols] <- lapply(df[,numcols], function(x) ifelse(abs(x) > 50, NA, x))
return(df)
}
Here's a test
set.seed(1234)
mydf <- data.frame(
a = sample(-100:100, 20, T),
b = sample(30:70, 20, T),
c = sample(letters, 20, T),
stringsAsFactors = F
)
remove_errors(mydf)
Upvotes: 2