Reputation: 11
I have two measures for the same object. The measure is binary (1,0) but many observations are also missing, such that the possible options are: 1, 0, NA.
Data Have:
Source1 Source2
NA NA
NA 0
NA 1
0 NA
0 0
0 1
1 NA
1 0
1 1
(Sources can contradict each other, ignore that for now).
I would like to create a third composite variable that summarizes the two variables, such that IF EITHER of the two sources = 1, then the composite variable should be equal to 1. Otherwise, if either of the sources is not missing, then the composite variable should be equal to zero. Lastly, only if both sources are missing, the composite variable should be set to missing.
Data Want:
Source1 Source2 Composite
NA NA NA
NA 0 0
NA 1 1
0 NA 0
0 0 0
0 1 1
1 NA 1
1 0 1
1 1 1
I have tried different approaches but continue to have the same issue.
Attempt 1:
df<- df %>% mutate(combined = ifelse(df$source1==1 | df$source2==1, 1,
ifelse(df$source1==0 | df$source2==0, 0, NA)))
Attempt 2:
df2<- df %>% mutate(combined = ifelse(is.na(df$source1) & is.na(df$source2), NA,
ifelse(df$source1 == 1 | df$source2 ==1, 1, 0)))
Attempt 3:
df3<- df %>% mutate(combined = ifelse(df$source1==1, 1,
ifelse(df$source1==0 & df$source2==1, 1,
ifelse(df$source1==0 & df$source2==0, 0,
ifelse(df$source1==0 & is.na(df$source2), 0,
ifelse(is.na(df$source1) & df$source2'==1, 1,
ifelse(is.na(df$source1) & df$source2==0, 0, NA)))))))
The codes identify whether there is a 1 in either source, but the rest of the values are all missing regardless of there being a 0 or not.
Actual Output:
Source1 Source2 Composite
NA NA NA
NA 0 NA
NA 1 1
0 NA NA
0 0 NA
0 1 1
1 NA 1
1 0 1
1 1 1
Upvotes: 1
Views: 68
Reputation: 5138
Assuming both Source1
and Source2
columns are composed of 0
's,1
's, and NA
's (as you noted). You could use this as a base R solution. I.e., this uses do.call()
to call pmax()
over each of the relevant columns in your dataframe.
cols = paste0("Source", 1:2)
df$newcol = do.call(pmax, c(df[cols], na.rm = TRUE))
# equivalent to: pmax(df$Source1, df$Source2, na.rm = TRUE)
df
Source1 Source2 Composite newcol
1 NA NA NA NA
2 NA 0 0 0
3 NA 1 1 1
4 0 NA 0 0
5 0 0 0 0
6 0 1 1 1
7 1 NA 1 1
8 1 0 1 1
9 1 1 1 1
Data:
df = read.table(header = TRUE, text = "Source1 Source2 Composite
NA NA NA
NA 0 0
NA 1 1
0 NA 0
0 0 0
0 1 1
1 NA 1
1 0 1
1 1 1")
Upvotes: 1
Reputation: 1339
this was fun but i wouldn't recommend doing it like this.
source1<-c(NA, NA, NA, 0, 0, 0, 1, 1, 1)
source2<-c(NA, 0, 1, NA, 0, 1, NA, 0, 1)
df<-data.frame(source1, source2)
df$composite<-ifelse(test = is.na(df$source1) & is.na(df$source2), yes = NA,
no = ifelse(test = is.na(df$source1) & !is.na(df$source2), yes = df$source2,
no = ifelse(is.na(df$source2) & !is.na(df$source1), yes = df$source1,
no = ifelse(df$source1 > df$source2, yes = df$source1,
no = df$source2))))
source1 source2 composite
1 NA NA NA
2 NA 0 0
3 NA 1 1
4 0 NA 0
5 0 0 0
6 0 1 1
7 1 NA 1
8 1 0 1
9 1 1 1
Upvotes: 0
Reputation: 8602
One approach is to use case_when
rather than if-else
. It seems simplest to check for missing variables first, and then check the non-missing cases afterwards:
library(tidyverse)
df %>%
mutate(S1Miss = is.na(Source1),
S2Miss = is.na(Source2)) %>%
mutate(Composite = case_when(
S1Miss & S2Miss ~ NA,
S1Miss | S2Miss ~ 0,
Source1 == 1 & Source2 == 1 ~ 1,
TRUE ~ 0
)) %>%
select(Source1, Source2, Composite)
Note here I made it "easier to read" by first storing the variables in 1 call to mutate
and remove these intermediary results using select
.
Upvotes: 0