Reputation: 25
I'm totally new to R, and I've been trying to replace the NA
values with the mean value for each column. I've tried a lot of options. but none seems to work. I've tried this one and many similar ones but i keep on getting: argument is not numeric or logical: returning NA
.
script<-function() {
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
But then after a while I thought I'd just count the columns and came up with this:
script<-function() {
for (i in 1:20) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
which doesn't show any errors, but doesn't seem to work either. When I type in data it's just the same data frame, but unedited. Could anyone help me with this?
Upvotes: 0
Views: 129
Reputation: 19783
Feel free to make a function out of this (updated per mnel correction):
data.frame(lapply(data, function(x){replace(x, is.na(x), mean(x,na.rm=T))}))
Upvotes: 0
Reputation: 115392
The problem with your function is that it is a function, and thus the scoping only updates data
within the scope of the function
running
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
Not within a function will work as you wish.
Another approach would be to pass data
as an argument
imputeMean <-function(data) {
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
return(data)
}
# then you can save the result as a new object
updatedData <- imputeMean(data)
Note that for named lists (as data
is), [[<-
will copy every time, so you could get around this by using lapply
updatedData <- lapply(data, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
Upvotes: 5