phenomenomics
phenomenomics

Reputation: 11

R: Troubles with ifelse

I have two measures for the same object. The measure is binary (1,0) but many observations are also missing, such that the possible options are: 1, 0, NA.

Data Have:

Source1 Source2
NA      NA
NA      0
NA      1
0       NA
0       0
0       1
1       NA
1       0
1       1

(Sources can contradict each other, ignore that for now).

I would like to create a third composite variable that summarizes the two variables, such that IF EITHER of the two sources = 1, then the composite variable should be equal to 1. Otherwise, if either of the sources is not missing, then the composite variable should be equal to zero. Lastly, only if both sources are missing, the composite variable should be set to missing.

Data Want:

Source1 Source2 Composite
NA      NA      NA
NA      0       0
NA      1       1
0       NA      0
0       0       0
0       1       1
1       NA      1
1       0       1
1       1       1

I have tried different approaches but continue to have the same issue.

Attempt 1:

df<- df %>% mutate(combined = ifelse(df$source1==1 | df$source2==1, 1, 
                              ifelse(df$source1==0 | df$source2==0, 0, NA)))

Attempt 2:

df2<- df %>% mutate(combined = ifelse(is.na(df$source1)   & is.na(df$source2), NA, 
                               ifelse(df$source1 == 1     | df$source2 ==1,    1, 0)))

Attempt 3:

df3<- df %>% mutate(combined = ifelse(df$source1==1, 1, 
                               ifelse(df$source1==0 & df$source2==1, 1,
                                      ifelse(df$source1==0 &     df$source2==0, 0,
                                      ifelse(df$source1==0 & is.na(df$source2), 0,       
                               ifelse(is.na(df$source1) & df$source2'==1, 1,
                                      ifelse(is.na(df$source1) & df$source2==0, 0, NA)))))))

The codes identify whether there is a 1 in either source, but the rest of the values are all missing regardless of there being a 0 or not.

Actual Output:

Source1 Source2 Composite
NA      NA      NA
NA      0       NA
NA      1       1
0       NA      NA
0       0       NA
0       1       1
1       NA      1
1       0       1
1       1       1

Upvotes: 1

Views: 68

Answers (3)

Andrew
Andrew

Reputation: 5138

Assuming both Source1 and Source2 columns are composed of 0's,1's, and NA's (as you noted). You could use this as a base R solution. I.e., this uses do.call() to call pmax() over each of the relevant columns in your dataframe.

cols = paste0("Source", 1:2)
df$newcol = do.call(pmax, c(df[cols], na.rm = TRUE))
# equivalent to: pmax(df$Source1, df$Source2, na.rm = TRUE)

df
  Source1 Source2 Composite  newcol
1      NA      NA        NA      NA
2      NA       0         0       0
3      NA       1         1       1
4       0      NA         0       0
5       0       0         0       0
6       0       1         1       1
7       1      NA         1       1
8       1       0         1       1
9       1       1         1       1

Data:

df = read.table(header = TRUE, text = "Source1 Source2 Composite
NA      NA      NA
NA      0       0
NA      1       1
0       NA      0
0       0       0
0       1       1
1       NA      1
1       0       1
1       1       1")

Upvotes: 1

D.J
D.J

Reputation: 1339

this was fun but i wouldn't recommend doing it like this.

source1<-c(NA, NA, NA, 0, 0, 0, 1, 1, 1)
source2<-c(NA, 0, 1, NA, 0, 1, NA, 0, 1)

df<-data.frame(source1, source2)  

df$composite<-ifelse(test = is.na(df$source1) & is.na(df$source2), yes = NA, 
       no = ifelse(test = is.na(df$source1) & !is.na(df$source2), yes = df$source2, 
                   no = ifelse(is.na(df$source2) & !is.na(df$source1), yes = df$source1,
                               no = ifelse(df$source1 > df$source2, yes = df$source1,
                                           no = df$source2))))

  source1 source2 composite
1      NA      NA        NA
2      NA       0         0
3      NA       1         1
4       0      NA         0
5       0       0         0
6       0       1         1
7       1      NA         1
8       1       0         1
9       1       1         1

Upvotes: 0

Oliver
Oliver

Reputation: 8602

One approach is to use case_when rather than if-else. It seems simplest to check for missing variables first, and then check the non-missing cases afterwards:

library(tidyverse)
df %>% 
  mutate(S1Miss = is.na(Source1),
         S2Miss = is.na(Source2)) %>% 
  mutate(Composite = case_when(
         S1Miss & S2Miss ~ NA, 
         S1Miss | S2Miss ~ 0, 
         Source1 == 1 & Source2 == 1 ~ 1,
         TRUE ~ 0
         )) %>% 
  select(Source1, Source2, Composite)

Note here I made it "easier to read" by first storing the variables in 1 call to mutate and remove these intermediary results using select.

Upvotes: 0

Related Questions