Nneka
Nneka

Reputation: 1860

replacing all NA with a 0 in data.table in R

I have a data.table with many columns. There are 4 columns where I want to replace NA with an 0.

I have a working solution:

  claimsMonthly[is.na(claim9month),claim9month := 0
          ][is.na(claim10month),claim10month := 0
            ][is.na(claim11month),claim11month := 0
              ][is.na(claim12month),claim12month := 0]

However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):

  for (i in 9:12){
    claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
  }

When I run this loop nothing happens. I guess it is due to the pact that the paste0() returns "claim12month", so I get in.na("claim12month"). The result of that is FALSE despite the fact that there are NA in my data. I guess this has something to do with the quotes?

This is not the first time i have issues with using paste0() or running loops with data.table, so I must be missing something important here.

Any ideas how to fix this?

Upvotes: 6

Views: 3481

Answers (2)

akrun
akrun

Reputation: 887153

We can either specify the .SDcols with the names of the columns ('nm1'), loop over the .SD (Subset of Data.table) and assign the NA to 0 (replace_na from tidyr)

library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]

Or as @jangorecki mentioned in the comments, nafill from data.table would be better

setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]

or using a loop with set, assign the columns of interest with 0 based on the NA values in each column by specifying the i (for row index) and j for column index/name

for(j in nm1){
    set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
 }

Or with setnafill

setnafill(claimsMonthly, cols = nm1, fill = 0)

Upvotes: 7

COLO
COLO

Reputation: 1114

You can use:

claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0

Also you can use variable names:

claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0

Or even better you can use a vector with all variables with "claimXXmonth" pattern.

Upvotes: 0

Related Questions