Reputation: 1860
I have a data.table
with many columns. There are 4 columns where I want to replace NA
with an 0.
I have a working solution:
claimsMonthly[is.na(claim9month),claim9month := 0
][is.na(claim10month),claim10month := 0
][is.na(claim11month),claim11month := 0
][is.na(claim12month),claim12month := 0]
However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):
for (i in 9:12){
claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
}
When I run this loop nothing happens. I guess it is due to the pact that the paste0()
returns "claim12month"
, so I get in.na("claim12month")
. The result of that is FALSE
despite the fact that there are NA
in my data. I guess this has something to do with the quotes?
This is not the first time i have issues with using paste0()
or running loops with data.table
, so I must be missing something important here.
Any ideas how to fix this?
Upvotes: 6
Views: 3481
Reputation: 887153
We can either specify the .SDcols
with the names of the columns ('nm1'), loop over the .SD
(Subset of Data.table) and assign the NA to 0 (replace_na
from tidyr
)
library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]
Or as @jangorecki mentioned in the comments, nafill
from data.table
would be better
setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]
or using a loop with set
, assign the columns of interest with 0 based on the NA values in each column by specifying the i
(for row index) and j
for column index/name
for(j in nm1){
set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
}
Or with setnafill
setnafill(claimsMonthly, cols = nm1, fill = 0)
Upvotes: 7
Reputation: 1114
You can use:
claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0
Also you can use variable names:
claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0
Or even better you can use a vector with all variables with "claimXXmonth" pattern.
Upvotes: 0