Thilaga raghavan
Thilaga raghavan

Reputation: 101

Replace all zero columns with NA

I have the following dataframe in R

library(dplyr)
 library(tidyr)

df= data.frame("ID"= c("A", "A", "A", "A", "B", "B", "B"), 
"A1"= c(0,0, 0, 0, 1,0,1), "B1"= c(1,0, 1,0, 0, 0, 0))

The dataframe appears as follows

  ID A1 B1
1  A  0  1
2  A  0  0
3  A  0  1
4  A  0  0
5  B  1  0
6  B  0  0
7  B  1  0

I would like to obtain the following dataframe

   ID A1 B1
 1  A NA  1
 2  A NA  0
 3  A NA  1
 4  A NA  0
 5  B  1 NA
 6  B  0 NA
 7  B  1 NA

I have tried the following code

df%>%group_by(ID)%>%
mutate(A1=case_when(sum(A1)==0~NA))%>%
mutate(B1=case_when(sum(B1)==0~NA))

This converts A1 and B1 completely to NA values.

I request some help here.

Upvotes: 3

Views: 878

Answers (2)

akrun
akrun

Reputation: 887118

With tidyverse, we can use if/else

library(tidyverse)
df %>% 
   group_by(ID) %>% 
   mutate_all(list(~ if(all(.==0)) NA_integer_ else .))
# ID       A1    B1
#  <fct> <dbl> <dbl>
#1 A        NA     1
#2 A        NA     0
#3 A        NA     1
#4 A        NA     0
#5 B         1    NA
#6 B         0    NA
#7 B         1    NA

Or without any if/else

df %>%
   group_by(ID) %>%
   mutate_all(~ NA^all(!.) * .)

or using data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) replace(x, all(x == 0), NA)), ID]

Or using base R

by(df[-1], df$ID, FUN = function(x)  x * (NA^ !colSums(!!x))[col(x)])

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

We can group_by ID and use mutate_all with replace

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate_all(~replace(., all(. == 0), NA))

# ID       A1    B1
#  <fct> <dbl> <dbl>
#1 A        NA     1
#2 A        NA     0
#3 A        NA     1
#4 A        NA     0
#5 B         1    NA
#6 B         0    NA
#7 B         1    NA

If there are other columns and we want to apply this only to specific columns we can use mutate_at

df %>%
  group_by(ID) %>%
  mutate_at(vars(A1,B1), ~replace(., all(. == 0), NA))

Using case_when we can do this as

df %>%
  group_by(ID) %>%
  mutate_all(~case_when(all(. == 0) ~ NA_real_, TRUE ~ .))

The problem in OP's attempt was there was no TRUE case defined in case_when so when no condition is matched it returns NA by default. From ?case_when

If no cases match, NA is returned.

So if we define the TRUE case it would work as expected. Also we should not check for sum(A1)==0 because if there are negative and positive values in the column (like -2 , +2) they would add up to 0 giving unexpected results.

df%>%
  group_by(ID) %>%
  mutate(A1 = case_when(all(A1 == 0) ~ NA_real_, TRUE ~ A1), 
         B1 = case_when(all(B1 == 0) ~ NA_real_, TRUE ~ B1))

Upvotes: 3

Related Questions