josliber
josliber

Reputation: 44320

Redefining %in% for matrices

I love being able to operate across matrix elements in R with operators like == and |:

(m <- matrix(1:4, nrow=2))
#      [,1] [,2]
# [1,]    1    3
# [2,]    2    4

m == 2 | m == 3
#       [,1]  [,2]
# [1,] FALSE  TRUE
# [2,]  TRUE FALSE

Unfortunately, %in% doesn't have this same nice behavior, and returns a vector instead of a matrix:

m %in% c(2, 3)
# [1] FALSE  TRUE  TRUE FALSE

Noting that %in% is defined as function(x, table) match(x, table, nomatch = 0L) > 0L, I figured I could redefine match to get my desired behavior:

match <- function(x, table, nomatch = NA_integer_, incomparables = NULL) {
  m <- base:::match(x, table, nomatch, incomparables)
  if (is.matrix(x)) matrix(m, nrow(x))
  else m
}

While this does work if I explicitly call match, I still don't get the desired result when running m %in% c(2, 3):

match(m, c(2, 3), nomatch=0L) > 0L
#       [,1]  [,2]
# [1,] FALSE  TRUE
# [2,]  TRUE FALSE
m %in% c(2, 3)
# [1] FALSE  TRUE  TRUE FALSE

Why isn't %in% now returning a matrix?

Upvotes: 4

Views: 127

Answers (2)

josliber
josliber

Reputation: 44320

Thanks to @joran for pointing me to this excellent article, which clarified for me why %in% was not using my newly defined match function. Here's my understanding of what's going on:

The user-defined match function is stored in the global environment, while the original match function is still stored in namespace:base:

environment(match)
# <environment: R_GlobalEnv>
environment(base::match)
# <environment: namespace:base>

Now, consider what happens when I call m %in% c(2, 3):

  1. This executes the %in% function, which is just defined as function(x, table) match(x, table, nomatch = 0L) > 0L.
  2. The function needs to find the match function, so it first searches in its local environment that was created as part of the function call. match is not defined there.
  3. The next place to look for match is the enclosing environment of the function. We can figure out what that is with:

environment(`%in%`)
# <environment: namespace:base>
  1. Since the original version of match (not the user-defined version) is defined in namespace:base, this is the version of the function that is called.

To get my matrix version of %in% to work, the simplest approach is to follow the advice of @Molx and redefine %in% so it's stored in the global environment (note that there's still an identical version of the function in namespace:base):

`%in%` <- function(x, table) match(x, table, nomatch = 0L) > 0L
environment(`%in%`)
# <environment: R_GlobalEnv>

Now m %in% c(2, 3) will search for the match function first in its local function environment and then in the enclosing environment (R_GlobalEnv), finding our user-defined version of the match function:

m %in% c(2, 3)
#       [,1]  [,2]
# [1,] FALSE  TRUE
# [2,]  TRUE FALSE

Another way we could have gotten %in% to use the user-defined match function would be to change the enclosing environment of base::"%in%" to the global environment:

rm(`%in%`)  # Remove user-defined %in%
environment(`%in%`) <- .GlobalEnv    # Can be reversed with environment(`%in%`) <- asNamespace("base")
m %in% c(2, 3)
#       [,1]  [,2]
# [1,] FALSE  TRUE
# [2,]  TRUE FALSE

As mentioned by the commenters on @Molx's answer, the most sensible thing to do is to avoid all this headache by naming my new function something else like %inm%.

Upvotes: 3

Molx
Molx

Reputation: 6931

I'm not sure why your attempt didn't work, but I imagine that %in% will use base:::match regardless of your redefined match. But why not redefine %in% itself?

`%in%` <- function(x, table) {
  m <- base::match(x, table, nomatch = 0L) > 0L
  if (is.matrix(x)) matrix(m, nrow(x))
  else m
}

m <- matrix(1:4, nrow=2)

m %in% c(2, 3)

#       [,1]  [,2]
# [1,] FALSE  TRUE
# [2,]  TRUE FALSE

As suggested in the comments and usually in terms of good practices, it would be safer to use a different name, like %inm% or %min%.

Upvotes: 3

Related Questions