Reputation: 73
[data.table] I have written a function like this to replace NA to 0 if a column is numeric
fn.naremove <- function(data){
for (i in 1: length(data)){
if (class(data[[i]]) %in% c("numeric", "interger", "interger64")) {
print(data[, names(data[, i]) := replace(data[, i], is.na(data[, i]), 0)])
}
else {
print(data)
}}}
I have a sample data table like below
dt1<- data.table(C1= c(1, 5, 14, NA, 54), C2= c(9, NA, NA, 3, 42), C3= c(9, 7, 42, 87, NA))
if I use fn.naremove(dt1)
it returns the error
Error in `[.data.table`(data, , i) :
j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found.
Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.
If I run the code with the actual column index, it runs smoothly and returns the result I wanted for column number 1:
dt1[, names(dt1[, 1]) := replace(dt1[, 1], is.na(dt1[, 1]), 0)]
C1 C2 C3
1: 1 9 9
2: 5 NA 7
3: 14 NA 42
4: 0 3 87
5: 54 42 NA
Please tell me if I miss or did something wrong with my function. Thanks in advance!!
Upvotes: 3
Views: 75
Reputation: 72919
You may use replace
.
replace(dt1, is.na(dt1), 0)
# C1 C2 C3
# 1: 1 9 9
# 2: 5 0 7
# 3: 14 0 42
# 4: 0 3 87
# 5: 54 42 0
There's a nice function around that stays in the data.table
universe and which we may expand to account for specific classes.
dt1 <- cbind(dt1, x=c("a", NA)) ## add a categorcal variable
library(data.table)
classes <- c("numeric", "interger", "interger64") ## define sp. classes
fun <- function(DT) {
for (j in names(DT)) {
set(DT, which(is.na(DT[[j]]) & class(DT[[j]]) %in% classes), j, 0)
}
}
fun(dt1)
dt1
# C1 C2 C3 x
# 1: 1 9 9 a
# 2: 5 0 7 <NA>
# 3: 14 0 42 a
# 4: 0 3 87 <NA>
# 5: 54 42 0 a
Only NA's of defined classes are replaced. This should be most effective since no copies are made.
Upvotes: 3
Reputation: 388982
Note that names(dt1[, 1])
works but when you do -
i <- 1
names(dt1[, i])
It doesn't work and returns an error
Error in
[.data.table
(dt1, , i) : j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.
The solution is to use ..i
i.e names(dt1[, ..i])
.
Other option is -
fn.naremove <- function(data){
for (i in 1: length(data)){
if (class(data[[i]]) %in% c("numeric", "interger", "interger64")) {
print(data[, names(data)[i] := replace(data[[i]], is.na(data[[i]]), 0)])
} else {
print(data)
}}
}
fn.naremove(dt1)
Upvotes: 2