Reputation: 155
Suppose I have two data frames:
A <- data.frame("SerialNum" = integer(), "Year" = integer(), stringsAsFactors = F)
A[1,] <- c(93843, 2001)
A[2,] <- c(12458, 2007)
A[3,] <- c(11112, 2000)
A[4,] <- c(18293, 2013)
A[5,] <- c(81203, 2014)
A[6,] <- c(11112, 2001)
A[7,] <- c(11112, 2013)
A[8,] <- c(11112, 2014)
B <- data.frame("SerialNum" = integer(), "Year" = integer(), stringsAsFactors= F)
B[1:3,] <- rbind(c(11112, 2000), c(18293, 2013),c(81203, 2014))
B[4,] <- c(48639, 2012)
B[5,] <- c(00128, 2003)
B[6,] <- c(67942, 2005)
What I want to do is to create new data frames: A_Match
which contains all of the entries of A
common to B
, and B_Match
with all of the entries of B
common to A
. Doing this with a for loop is quite easy, but it's too slow for my actual data. The tricky part with my data is that different years may contain the same serial number, so I have to check both serial number and year in order to subset my data properly. What is an object-oriented way to do this in R? I'm not sure which functions can help me with this task. My for loop is
L_A <- nrow(A)
L_B <- nrow(B)
A_Inds <- integer()
B_Inds <- integer()
for (i in 1:L_A){
IncNums <- which(B$SerialNum == A$SerialNum[i])
YNums <- which(B$Year == A$Year[i])
B_Inds <- union(B_Inds, intersect(IncNums, YNums))
}
for (i in 1:L_B){
IncNums <- which(A$SerialNum == B$SerialNum[i])
YNums <- which(A$Year == B$Year[i])
A_Inds <- union(A_Inds, intersect(IncNums, YNums))
}
A_Match <- A[unique(A_Inds),]
B_Match <- B[unique(B_Inds),]
Upvotes: 2
Views: 66
Reputation: 1902
I believe this is also equivalent, and maybe works better for large datasets:
library(dplyr)
semi_join(A,B)
Upvotes: 0
Reputation: 44330
You can use %in%
to check if each serial number from one data frame is present in the other and then use standard row indexing to limit to the matches:
(A_Match <- A[A$SerialNum %in% B$SerialNum,])
# SerialNum Year
# 3 11112 2000
# 4 18293 2013
# 5 81203 2014
(B_Match <- B[B$SerialNum %in% A$SerialNum,])
# SerialNum Year
# 1 11112 2000
# 2 18293 2013
# 3 81203 2014
Upvotes: 2