Reputation: 436

Replace NA with 0 depending on group (rows) and variable names (column)

I have a large data set and want to replace many NAs, but not all.

In one group i want to replace all NAs with 0. In the other group i want to replace all NAs with 0, but only in variables that do not include a certain part of the variable name e.g. 'b'

Here is an example:

group <- c(1,1,2,2,2)
abc <- c(1,NA,NA,NA,NA)
bcd <- c(2,1,NA,NA,NA)
cde <- c(5,NA,NA,1,2)
df <- data.frame(group,abc,bcd,cde)

  group abc bcd cde
1     1   1   2   5
2     1  NA   1  NA
3     2  NA  NA  NA
4     2  NA  NA   1
5     2  NA  NA   2

This is what i want:

  group abc bcd cde
1     1   1   2   5
2     1   0   1   0
3     2  NA  NA   0
4     2  NA  NA   1
5     2  NA  NA   2

This is what i tried:

#set 0 in first group: this works fine
df[is.na(df) & df$group==1] <- 0
#set 0 in second group but only if the variable name includes b: does not work
df[is.na(df) & df$group==2 & !grepl('b',colnames(df))] <- 0

dplyr solutions are welcome as well as basic

Upvotes: 1

Answers (3)

Diego Rojas

Reputation: 199

Alternatively, you can use:

library(dplyr)
df2 <- df %>% mutate_at(vars(names(df)[-1]),
         function(x) case_when((group==1 & is.na(x) ) ~ 0,
              (group==2 & is.na(x) & !grepl("b",deparse(substitute(x)))) ~ 0,
              TRUE ~ x))
> df2
  group abc bcd cde
1     1   1   2   5
2     1   0   1   0
3     2  NA  NA   0
4     2  NA  NA   1
5     2  NA  NA   2

Upvotes: 0

Douglas Mesquita

Reputation: 1021

Using dplyr::mutate_at you can also do:

library(dplyr)

vars_mutate_1 <- names(df)[-1]
vars_mutate_2 <- grep(x = names(df)[-1], pattern = '^(?!.*b).*$', perl = TRUE, value = TRUE)

df %>% 
  mutate_at(.vars = vars_mutate_1, .funs = funs(if_else(group == 1 & is.na(.), 0, .))) %>%
  mutate_at(.vars = vars_mutate_2, .funs = funs(if_else(group == 2 & is.na(.), 0, .)))

  group abc bcd cde
1     1   1   2   5
2     1   0   1   0
3     2  NA  NA   0
4     2  NA  NA   1
5     2  NA  NA   2

Upvotes: 0

akrun

Reputation: 887911

For the second group, create the column index with grep and use that to subset the data while assigning

j1 <- !grepl('b',colnames(df))
df[j1][df$group == 2 & is.na(df[j1])] <- 0
df
#  group abc bcd cde
#1     1   1   2   5
#2     1   0   1   0
#3     2  NA  NA   0
#4     2  NA  NA   1
#5     2  NA  NA   2

Upvotes: 1

Replace NA with 0 depending on group (rows) and variable names (column)

Answers (3)

Related Questions