Reputation: 155
Suppose I have two distinct data sets Data1
and Data2
. For each entry in Data1$Incidents
I want to find the rows in Data2$Incidents
which match it, and also keep track of entries which have no matches. I subsequently save the entries which match into a new data frame Data1_Matches
. Now for each entry in Data2$Incidents
I look for the entries in Data1_Matches$Incidents
which match, and then create an analogous data frame Data2_Matches
.
Suppose for the sake of argument my data sets look like the following:
Day Incidents
"Monday" 30
"Friday" 11
"Sunday" 27
My algorithm at the moment looks like the following:
Data1_Incs = as.integer(Data1$Incidents)
LEN1 = length(Data1_Incs)
No_Match = 0
for (k in 1:LEN1){
Incs = which(Data2$Incidents == Data1_Incs[k])
if (length(Incs) == 0){
No_Match = c(No_Match,k)
}
}
No_Match = No_Match[-1]
Data1_Match <- Data1[-No_Match,]
Data1_No_Match <- Data1[ No_Match,]
Data2_Incs = Data2$Incidents
LEN2 = length(Data2_Incs)
Un_Match = 0
for (j in 1:LEN2){
Incs = which(as.integer(Data1_Match$Incidents) == Data2_Incs[j])
if (length(Incs) == 0){
Un_Match = c(Un_Match, j)
}
}
Un_Match = Un_Match[-1]
Data2_Match <- Data2[-Un_Match,]
Data2_No_Match <- Data2[ Un_Match,]
What is a better way for me to accomplish this task, without using a for loop? For reference Data1
has about 15,000 entries while Data2
has closer to two million.
Upvotes: 3
Views: 82
Reputation: 1895
Try to use setdiff
.
I wil demonstrate on the first for loop:
No_Match <- setdiff(unique(Data2$Incidents), unique(Data1$Incidents))
Not sure if this quite satisfies your requirement.
Upvotes: 3