Reputation: 101
I have the following dataframe in R
library(dplyr)
library(tidyr)
df= data.frame("ID"= c("A", "A", "A", "A", "B", "B", "B"),
"A1"= c(0,0, 0, 0, 1,0,1), "B1"= c(1,0, 1,0, 0, 0, 0))
The dataframe appears as follows
ID A1 B1
1 A 0 1
2 A 0 0
3 A 0 1
4 A 0 0
5 B 1 0
6 B 0 0
7 B 1 0
I would like to obtain the following dataframe
ID A1 B1
1 A NA 1
2 A NA 0
3 A NA 1
4 A NA 0
5 B 1 NA
6 B 0 NA
7 B 1 NA
I have tried the following code
df%>%group_by(ID)%>%
mutate(A1=case_when(sum(A1)==0~NA))%>%
mutate(B1=case_when(sum(B1)==0~NA))
This converts A1 and B1 completely to NA values.
I request some help here.
Upvotes: 3
Views: 878
Reputation: 887118
With tidyverse
, we can use if/else
library(tidyverse)
df %>%
group_by(ID) %>%
mutate_all(list(~ if(all(.==0)) NA_integer_ else .))
# ID A1 B1
# <fct> <dbl> <dbl>
#1 A NA 1
#2 A NA 0
#3 A NA 1
#4 A NA 0
#5 B 1 NA
#6 B 0 NA
#7 B 1 NA
Or without any if/else
df %>%
group_by(ID) %>%
mutate_all(~ NA^all(!.) * .)
or using data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) replace(x, all(x == 0), NA)), ID]
Or using base R
by(df[-1], df$ID, FUN = function(x) x * (NA^ !colSums(!!x))[col(x)])
Upvotes: 2
Reputation: 388982
We can group_by
ID
and use mutate_all
with replace
library(dplyr)
df %>%
group_by(ID) %>%
mutate_all(~replace(., all(. == 0), NA))
# ID A1 B1
# <fct> <dbl> <dbl>
#1 A NA 1
#2 A NA 0
#3 A NA 1
#4 A NA 0
#5 B 1 NA
#6 B 0 NA
#7 B 1 NA
If there are other columns and we want to apply this only to specific columns we can use mutate_at
df %>%
group_by(ID) %>%
mutate_at(vars(A1,B1), ~replace(., all(. == 0), NA))
Using case_when
we can do this as
df %>%
group_by(ID) %>%
mutate_all(~case_when(all(. == 0) ~ NA_real_, TRUE ~ .))
The problem in OP's attempt was there was no TRUE
case defined in case_when
so when no condition is matched it returns NA
by default. From ?case_when
If no cases match, NA is returned.
So if we define the TRUE
case it would work as expected. Also we should not check for sum(A1)==0
because if there are negative and positive values in the column (like -2 , +2) they would add up to 0 giving unexpected results.
df%>%
group_by(ID) %>%
mutate(A1 = case_when(all(A1 == 0) ~ NA_real_, TRUE ~ A1),
B1 = case_when(all(B1 == 0) ~ NA_real_, TRUE ~ B1))
Upvotes: 3