JKiss
JKiss

Reputation: 13

How to use R ifelse statements with multiple conditions?

I am new to R but am excited to learn it and I thought this might be a good opportunity. I have two measurements of salinity (uS and mS.m_1.5). I have created 3 classes (1, 2, 3) for each measurement type (uSClass and mS.m_1.5Class) based on their values. For many of the observations, I only have 1 measurement type. I want to create a new class (SClass) based on these two classes.

Any observation of uSClass = 1 and mS.m_1.5Class = 1, should be SClass 1.

Any observation of uSClass = 1 and mS.m_1.5Class = NA, should be SClass 1.

Any observation of uSClass = NA and mS.m_1.5Class = 1, should be SClass 1. etc...

Any observation with conflicting classes (ex. uSClass = 1 and mS.m_1.5Class = 2) should not be assigned a class (NA). This is my code:

    std$SClass <- ifelse(std$uSClass == 1 & std$mS.m_1.5Class == 1, 1, 
                     ifelse(std$uSClass == 1 & is.na(std$mS.m_1.5Class), 1,
                        ifelse(is.na(std$uSClass) & std$mS.m_1.5Class == 1, 1,
                  ifelse(std$uSClass == 2 & std$mS.m_1.5Class == 2, 2,
                     ifelse(std$uSClass == 2 & is.na(std$mS.m_1.5Class), 2,
                        ifelse(is.na(std$uSClass) & std$mS.m_1.5Class == 2, 2,
                  ifelse(std$uSClass == 3 & std$mS.m_1.5Class == 3, 3,
                     ifelse(std$uSClass == 3 & is.na(std$mS.m_1.5Class), 3,
                        ifelse(is.na(std$uSClass) & std$mS.m_1.5Class == 3, 3, NA)))))))))

It makes logical sense to me but it must not be correct. The only classifications that work are those where both uSClass and mS.m_1.5Class have values. If I run the entire code, most observations are assigned NA. I have tried a couple other methods incorporating | operators but those have not worked either. Your help is appreciated!

Upvotes: 1

Views: 9009

Answers (3)

Gregor Thomas
Gregor Thomas

Reputation: 145775

The rowMeans approach works well in this case and will be very difficult to beat speed-wise. For a more general approach, most of what you're doing is finding the non-missing values in a series of columns. This is commonly called a "coalesce", and it it built-in to the dplyr package (among others).

If you didn't have mismatches, then your operation could be simplified to this (using Pierre's nicely shared data):

with(mydata, dplyr::coalesce(Var1, Var2))
#    Var1 Var2  r
# 1     1    1  1
# 4    NA    1  1
# 6     2    2  2
# 8    NA    2  2
# 11    3    3  3
# 12   NA    3  3
# 13    1   NA  1
# 14    2   NA  2
# 15    3   NA  3
# 16   NA   NA NA

With mismatches, we need to check for those separately:

std$r = with(std, ifelse(Var1 != Var2 & !is.na(Var1) & !is.na(Var2), NA,
                         coalesce(Var1, Var2)))
#    Var1 Var2  r
# 1     1    1  1
# 2     2    1 NA
# 3     3    1 NA
# 4    NA    1  1
# 5     1    2 NA
# 6     2    2  2
# 7     3    2 NA
# 8    NA    2  2
# 9     1    3 NA
# 10    2    3 NA
# 11    3    3  3
# 12   NA    3  3
# 13    1   NA  1
# 14    2   NA  2
# 15    3   NA  3
# 16   NA   NA NA

We can also go back to ifelse for a nice vectorized solution. I've wrapped it in a function as in @dayne's answer, but I've using the vectorized ifelse rather than if(){}else{} and an external call to mapply gets a big speed improvement (though rowMeans is still fastest):

getClass3 <- function(c1, c2) {
  ifelse((!is.na(c1) & !is.na(c2)),
         ifelse(c1 == c2, c1, NA),
         ifelse(is.na(c1), c2, c1))
}


microbenchmark(plafortune = {
    r <- rowMeans(std, na.rm = TRUE)
    is.na(r) <- !r %in% 1:3 | std[, 1] != std[, 2]
},
dayne = {
    mapply(getClass2, c1 = std[, 1], c2 = std[, 2])
},
coal = {
    ifelse(std[, 1] != std[, 2] & !is.na(std[, 1]) & !is.na(std[, 2]), NA, coalesce(std[, 1], std[, 2]))
},
getClass_ifelse = {
    getClass3(std[, 1], std[, 2])
}
)
# Unit: milliseconds
#             expr       min        lq      mean    median        uq      max neval  cld
#       plafortune  10.09130  10.49593  18.95146  12.31516  14.46738 194.7095   100 a   
#            dayne 466.60288 499.47639 552.12454 529.53229 573.53311 823.2745   100    d
#             coal  20.70184  24.10026  40.87038  26.22795  31.20252 217.3142   100  b  
#  getClass_ifelse  50.90161  56.41823  96.69930  64.78723  95.32416 262.2016   100   c 

Running on the large data (1e5 rows), rowMeans is definitely fastest. Coalesce does pretty well, and the vectorized ifelse is still an order of magnitude faster than the 1-line-at-a-time version. Worth noting that if there were more columns involved the rowMeansadvantage would probably grow, and it would also by far be the easiest to cod.

Upvotes: 2

dayne
dayne

Reputation: 7784

I think this gives what you are asking for:

getClass <- function(c1, c2) {
  if (!is.na(c1) && !is.na(c2)) {
    return(NA)
  } else {
    return(ifelse(is.na(c1), c2, c1))
  }
  NA
}

c1 <- c(1,  2, NA, 3, NA, NA,  2, NA,  1)
c2 <- c(NA, NA, 1, 2,  1,  3, NA, NA, NA)
mapply(getClass, c1 = c1, c2 = c2)
 # [1]  1  2  1 NA  1  3  2 NA  1

EDIT

If you want values the have the same class to return that class, just modify the first if statement:

getClass2 <- function(c1, c2) {
  if (!is.na(c1) && !is.na(c2) && c1 != c2) {
    return(NA)
  } else {
    return(ifelse(is.na(c1), c2, c1))
  }
  NA
}
c1 <- c(1,  2, NA, 3, NA, NA,  2, NA,  1, 1, 2, 3)
c2 <- c(NA, NA, 1, 2,  1,  3, NA, NA, NA, 1, 2, 3)
mapply(getClass2, c1 = c1, c2 = c2)
# [1]  1  2  1 NA  1  3  2 NA  1  1  2  3

Upvotes: 1

Pierre L
Pierre L

Reputation: 28441

You may be looking for rowMeans as a logical shortcut.

rowMeans(mydata, na.rm=TRUE)

Example

#Create example with all possible combinations
std <- expand.grid(c(1:3,NA), c(1:3,NA))
ind <- apply(std, 1, function(x) anyDuplicated(x) | any(is.na(x)))

mydata <- std[ind,]
mydata
#    Var1 Var2
# 1     1    1
# 4    NA    1
# 6     2    2
# 8    NA    2
# 11    3    3
# 12   NA    3
# 13    1   NA
# 14    2   NA
# 15    3   NA
# 16   NA   NA

The example is set up. Here all the possible ways of combining 1 to 3 and NA. We use rowMeans to solve the problem:

mydata$SClass <- rowMeans(mydata, na.rm=TRUE)
mydata
#    Var1 Var2 SClass
# 1     1    1      1
# 4    NA    1      1
# 6     2    2      2
# 8    NA    2      2
# 11    3    3      3
# 12   NA    3      3
# 13    1   NA      1
# 14    2   NA      2
# 15    3   NA      3
# 16   NA   NA    NaN

Edit

It makes no difference if there are also some mismatches. We can add:

r <- rowMeans(std, na.rm=TRUE)
is.na(r) <- !r %in% 1:3 | std[,1] != std[,2]

#Verification
cbind(std, r)
   Var1 Var2  r
1     1    1  1
2     2    1 NA
3     3    1 NA
4    NA    1  1
5     1    2 NA
6     2    2  2
7     3    2 NA
8    NA    2  2
9     1    3 NA
10    2    3 NA
11    3    3  3
12   NA    3  3
13    1   NA  1
14    2   NA  2
15    3   NA  3
16   NA   NA NA

Verify above that all possible combinations are correct.

Speed Test

Something for the doubters. 5000% faster

Unit: milliseconds
       expr        min         lq      mean    median        uq       max neval cld
 plafortune   7.370385   9.246964  10.44307  10.10766  11.55795  18.72463   100  a 
      dayne 443.972804 506.965996 555.80049 550.91229 582.45713 831.18534   100   b

Data

std <- data.frame(x=sample(c(1:3,NA), 1e5, T), y=sample(c(1:3,NA), 1e5, T))

getClass <- function(c1, c2) {
  if (!is.na(c1) && !is.na(c2)) {
    return(NA)
  } else {
    return(ifelse(is.na(c1), c2, c1))
  }
  NA
}

library(microbenchmark)
microbenchmark(plafortune={r <- rowMeans(std, na.rm=TRUE)
is.na(r) <- !r %in% 1:3 | std[,1] != std[,2]},
dayne = {mapply(getClass, c1 = std[,1], c2 = std[,2])})

Upvotes: 2

Related Questions