Reputation: 569
I want to write a function in R that does the following: I have a table of cases, and some data. I want to find the correct row matching to each observation from the data. Example:
crit1 <- c(1,1,2)
crit2 <- c("yes","no","no")
Cases <- matrix(c(crit1,crit2),ncol=2,byrow=FALSE)
data1 <- c(1,2,1)
data2 <- c("no","no","yes")
data <- matrix(c(data1,data2),ncol=2,byrow=FALSE)
Now I want a function that returns for each row of my data, the matching row from Cases
, the result would be the vector
c(2,3,1)
Upvotes: 0
Views: 46
Reputation: 34733
Are you sure you want to be using matrices for this?
Note that the numeric data in crit1
and data1
has been converted to string (matrices can only store one data type):
typeof(data[ , 1L])
# [1] character
In R, a data.frame
is a much more natural choice for what you're after. data.table
is (among many other things) a toolset for working with "enhanced" data.frame
s; See the Introduction.
I would create your data as:
Cases = data.table(crit1, crit2)
data = data.table(data1, data2)
We can get the matching row indices as asked by doing a keyed join (See the vignette on keys):
setkey(Cases) # key by all columns
Cases
# crit1 crit2
# 1: 1 no
# 2: 1 yes
# 3: 2 no
setkey(data)
data
# data1 data2
# 1: 1 no
# 2: 1 yes
# 3: 2 no
Cases[data, which=TRUE]
# [1] 1 2 3
This differs from 2,3,1
because the order of your data has changed, but note that the answer is still correct.
If you don't want to change the order of your data, it's slightly more complicated (but more readable if you're not used to data.table
syntax):
Cases = data.table(crit1, crit2)
data = data.table(data1, data2)
Cases[data, on = setNames(names(data), names(Cases)), which=TRUE]
# [1] 2 3 1
The on=
part creates the mapping between the columns of data
and those of Cases
.
We could write this in a bit more SQL-like fashion as:
Cases[data, on = .(crit1 == data1, crit2 == data2), which=TRUE]
# [1] 2 3 1
This is shorter and more readable for your sample data, but not as extensible if your data has many columns or if you don't know the column names in advance.
Upvotes: 4
Reputation: 3876
The prodlim
package has a function for that:
library(prodlim)
row.match(data,Cases)
[1] 2 3 1
Upvotes: 3