9877126
9877126

Reputation: 3

is.element() for more than one variable

unique() removes duplicate elements of a vector, and duplicate rows of an array.

is.element(), %in%, and match() works only on vectors (or NULL).

Are there any value matching or set operations for multiple variables? (e.g. rows of an array)

My current workaround is this. It's not quite elegant, and it's obviously sensitive to "_" matching.

match.multiple <- function (x, table, nomatch = NA_integer_, incomparables = NULL) {
  x_vector <- apply(x, 1, paste, collapse="_")
  table_vector <- apply(table, 1, paste, collapse="_")
  match(x_vector, table_vector, nomatch, incomparables)}

is.element.multiple <- "%in.multiple%" <- function (el, set) match.multiple(el, set, 0) > 0

Edit: adding a reproducible example

Lets say that you wish to buy a car which has an equal number of forward gears and carburetors. It can be 1-each, 2-each etc. You don't know whether the cars that are available on the market (cf. mtcars) comply with your preferences.

preferences <- cbind(1:8, 1:8)
available <- cbind(mtcars$gear, mtcars$carb)

So you do a matching for both variables: gears and carburetors.

m <- match.multiple(preferences, available)
m
# [1] NA NA 12  1 NA NA NA NA
which(!is.na(m))
# [1] 3 4

These are the number of forward gears and carburetors which come in equal quantities.

willbuy <- m[!is.na(m)]
mtcars[willbuy, ]
#     mpg cyl  disp  hp drat   wt  qsec vs am gear carb
# 1: 16.4   8 275.8 180 3.07 4.07 17.40  0  0    3    3
# 2: 21.0   6 160.0 110 3.90 2.62 16.46  0  1    4    4

And these are catalogue entries for cars that you should consider.

Upvotes: 0

Views: 1940

Answers (2)

dww
dww

Reputation: 31452

A function to find occurences of a vector within rows of an array:

To test whether a vector (v) is a row of an array or matrix (m), we can construct a second matrix the same dimensions as the one we want to search in, but consisting of repeated rows of the vector we are looking for, and check whether any rows in this constructed array are identical to the original

is.row.in.rows <- function(v,m) {
  which(length(v) == rowSums(m == matrix(v, nrow(m), ncol(m), byrow=TRUE)))
}

Note that it is also possible to perform the same test with a loop using which(apply(m, 1, all.equal, v) == TRUE). But, the above vectorised version using rowSums is faster.

Using this function to solve the reproducible example in the question:

a <- unlist(apply(preferences, MARGIN = 1, is.row.in.rows, available))
a
# [1] 12 13 14  1  2 10 11

mtcars[a,]
#               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Merc 450SE    16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
# Merc 450SL    17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
# Merc 450SLC   15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
# Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Merc 280      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
# Merc 280C     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

Upvotes: 0

bgoldst
bgoldst

Reputation: 35314

As I mentioned in the comments, my answer to this question can be adapted to solve this problem. Here's how it can be done, demonstrating with the OP's reproducible example:

avail <- cbind(mtcars$gear,mtcars$carb);
prefs <- cbind(1:8,1:8);
do.call(rbind,apply(prefs,1L,function(x) mtcars[findarray(avail,matrix(x,1L))[,1L],]));
##                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Merc 450SE    16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL    17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC   15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Mazda RX4     21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Merc 280      19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C     17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

Upvotes: 1

Related Questions