Reputation: 303

R matching more than 2 conditions and return the response value

Hi I have two data set where the first one is a set of index:

ind1<-rep(c("E","W"), times=20)
ind2<-sample(100:150, 40)
y<-c(1:40)
index<-data.frame(cbind(ind1, ind2, y))

The second data set is the one needs to be indexed.

x1<-sample(c("E","W","N"), 40, replace=TRUE)
x2<-sample(100:150, 40)
x3<-rep(0, times=40)
data<-data.frame(cbind(x1,x2,x3))

I would like indicate in x3 where the x1 and x2 in data to be matched with ind1 and ind2 in index respectively and return the corresponding y.

index1<-split(index, index$ind1)
data1<-split(data, data$x1)
data1$E$x3<-match(data1$E$x2, index1$E$ind2)
data1$W$x3<-match(data1$W$x2, index1$W$ind2)

It kinda matched the way I wanted but did not return y correctly. Which part I did wrong? Thanks.

Also, is there a faster/smarter way of doing it? Because I might have more conditions to match with. Originally I tried if else statement but didn't work.

Upvotes: 5

Answers (4)

GKi

Reputation: 39707

This question is related to match two data.frames based on multiple columns.

You can use interaction or paste as already suggested by Dinre, to match on multiple columns.

#Write the row number of index in x3 which matches
data$x3 <- match(interaction(data[c("x1", "x2")]), interaction(index[c("ind1","ind2")]))

#In case you want to return 0 instead of NA for nomatch
data$x3 <- match(interaction(data[c("x1", "x2")]), interaction(index[c("ind1","ind2")]), nomatch=0)

#Instead of >interaction< you could also use paste as already suggested by Dinre
data$x3 <- match(paste(data$x1, data$x2), paste(index$ind1, index$ind2))

Upvotes: 1

Woldemar G

Reputation: 156

you can use also left_join() from dplyr package:

require(dplyr)
left_join(data, index, by = c("x1" = "ind1", "x2" = "ind2"))

read more here

Upvotes: 4

Drew Steen

Reputation: 16617

merge(data, index, by.x=c("ind1", "ind2"), by.y=c("x1", "x2"), all.x=TRUE, all.y=FALSE)

will give you the x and y values for each matching combination of ind1 and ind2, and x1 and x2. All combinations of x1 and x2 will be kept (even if that combination of ind1 and ind2 doesn't occur in index, but combinations of ind1 and ind2 that don't occur in data will be dropped. As written, the solution will keep x3 and y values, but if you'd like to drop the y values you can use merge(data[ ,-3], ... as per @Ferdinand.kraft 's suggestion.

Upvotes: 7

Dinre

Reputation: 4216

There are many ways to approach this, and it really depends on the characteristics of your data. Here is the most direct matching approach:

Pasting: The 'paste' function allows you to create a string from multiple pieces of data. If you are a matching between data sets using columns that have identical matches, you can simply paste the columns together and compare directly using a 'match' statement, like so:

new_data <- data

new_data$x3 <- ifelse(
    is.na(match(paste(data$x1, data$x2), paste(index$ind1, index$ind2))),
    0,
    index$y)

The match statement here compares exact matches between x1+x2 and ind1+ind2 pairs and returns an integer indicating which index pair location corresponds to each data row. If no match is found, NA is returned. By checking for NA in an 'ifelse' statement, we then write zeros for the NA values and return the corresponding y value for any matches.

Upvotes: 4

R matching more than 2 conditions and return the response value

Answers (4)

Related Questions