mac
mac

Reputation: 35

Loop over certain columns to replace NAs with 0 in a dataframe

I have spent a lot of time trying to write a loop to replace NAs with zeros for certain columns in a data frame and have not yet succeeded. I have searched and can't find similar question.

df <- data.frame(A = c(2, 4, 6, NA, 8, 10),
             B = c(NA, 10, 12, 14, NA, 16),
             C = c(20, NA, 22, 24, 26, NA),
             D = c(30, NA, NA, 32, 34, 36))
df

Gives me:

   A  B  C  D
1  2 NA 20 30
2  4 10 NA NA
3  6 12 22 NA
4 NA 14 24 32
5  8 NA 26 34
6 10 16 NA 36

I want to set NAs to 0 for only columns B and D. Using separate code lines, I could:

df$B[is.na(df$B)] <- 0
df$D[is.na(df$D)] <- 0

However, I want to use a loop because I have many variables in my real data set.

I cannot find a way to loop over only columns B and D so I get:

df

   A  B  C  D
1  2  0 20 30
2  4 10 NA  0
3  6 12 22  0
4 NA 14 24 32
5  8  0 26 34
6 10 16 NA 36

Essentially, I want to apply a loop using a variable list to a data frame:

varlist <- c("B", "D") 

How can I loop over only certain columns in the data frame using a variable list to replace NAs with zeros?

Upvotes: 0

Views: 1096

Answers (3)

RDRR
RDRR

Reputation: 880

Here's a base R one-liner

df[, varlist][is.na(df[, varlist])] <- 0

Upvotes: 2

Onyambu
Onyambu

Reputation: 79208

using the zoo package we can fill the selected columns.

 library(zoo)
 df[varlist]=na.fill(df[varlist],0)  
  df
    A  B  C  D
 1  2  0 20 30
 2  4 10 NA  0
 3  6 12 22  0
 4 NA 14 24 32
 5  8  0 26 34
 6 10 16 NA 36

In base R we can have

 df[varlist]=lapply(df[varlist],function(x){x[is.na(x)]=0;x})
  df
    A  B  C  D
 1  2  0 20 30
 2  4 10 NA  0
 3  6 12 22  0
 4 NA 14 24 32
 5  8  0 26 34
 6 10 16 NA 36

Upvotes: 1

missuse
missuse

Reputation: 19716

here is a tidyverse aproach:

library(tidyverse)
df %>%
  mutate_at(.vars = vars(B, D), .funs = funs(ifelse(is.na(.), 0, .)))
#output:
   A  B  C  D
1  2  0 20 30
2  4 10 NA  0
3  6 12 22  0
4 NA 14 24 32
5  8  0 26 34
6 10 16 NA 36

basically you say vars B and D should change by a defined function. Where . corresponds to the appropriate column.

Upvotes: 3

Related Questions