Mnifldz
Mnifldz

Reputation: 155

R Subsetting Data Without For Loop

Suppose I have two data frames:

    A <- data.frame("SerialNum" = integer(), "Year" = integer(), stringsAsFactors = F)
     A[1,] <- c(93843, 2001)
     A[2,] <- c(12458, 2007)
     A[3,] <- c(11112, 2000)
     A[4,] <- c(18293, 2013)
     A[5,] <- c(81203, 2014)
     A[6,] <- c(11112, 2001)
     A[7,] <- c(11112, 2013)
     A[8,] <- c(11112, 2014)

B <- data.frame("SerialNum" = integer(), "Year" = integer(), stringsAsFactors= F)
 B[1:3,] <-  rbind(c(11112, 2000), c(18293, 2013),c(81203, 2014))
 B[4,] <- c(48639, 2012)
 B[5,] <- c(00128, 2003)
 B[6,] <- c(67942, 2005)

What I want to do is to create new data frames: A_Match which contains all of the entries of A common to B, and B_Match with all of the entries of B common to A. Doing this with a for loop is quite easy, but it's too slow for my actual data. The tricky part with my data is that different years may contain the same serial number, so I have to check both serial number and year in order to subset my data properly. What is an object-oriented way to do this in R? I'm not sure which functions can help me with this task. My for loop is

L_A     <- nrow(A)
L_B     <- nrow(B)
A_Inds  <- integer()
B_Inds  <- integer()

for (i in 1:L_A){
  IncNums <- which(B$SerialNum == A$SerialNum[i])
  YNums   <- which(B$Year == A$Year[i])
  B_Inds  <- union(B_Inds, intersect(IncNums, YNums))
}

for (i in 1:L_B){
  IncNums <- which(A$SerialNum == B$SerialNum[i])
  YNums   <- which(A$Year == B$Year[i])
  A_Inds  <- union(A_Inds, intersect(IncNums, YNums))
}

A_Match <- A[unique(A_Inds),]
B_Match <- B[unique(B_Inds),]

Upvotes: 2

Views: 66

Answers (2)

atiretoo
atiretoo

Reputation: 1902

I believe this is also equivalent, and maybe works better for large datasets:

library(dplyr)
semi_join(A,B)

Upvotes: 0

josliber
josliber

Reputation: 44330

You can use %in% to check if each serial number from one data frame is present in the other and then use standard row indexing to limit to the matches:

(A_Match <- A[A$SerialNum %in% B$SerialNum,])
#   SerialNum Year
# 3     11112 2000
# 4     18293 2013
# 5     81203 2014
(B_Match <- B[B$SerialNum %in% A$SerialNum,])
#   SerialNum Year
# 1     11112 2000
# 2     18293 2013
# 3     81203 2014

Upvotes: 2

Related Questions