PMaier
PMaier

Reputation: 654

Compare matching entries in large data frames

Suppose I have two data frames with the following general structure:

A=data.frame(ID=c(1,1,2,3,6, 10), Obs=c(0,5,6,7,3,-4))
B=data.frame(ID=c(1,3,2,4,8), Obs=c(10,-5,NA,7,NA))

For matching ID's I want to report:

There are, however, a couple of complications:

So far, using R, I've parsed the data frames using a loop and IF-statements. E.g. some of my code would look something like this:

results.signflip <- data.frame()
results.missingvalue <- data.frame()
Intersection.ID<- intersect(A$ID, B$ID)

for (idx.row in 1:length(Intersection.ID)) {
 idx.selection.A   <- grep(paste0("^", Intersection.ID[idx.row]), A$ID)
 idx.selection.B   <- grep(paste0("^", Intersection.ID[idx.row]), B$ID)

 if ( sign(!A[idx.row, "Obs"] == sign(B[idx.row, "Obs"] )) 
   results.signflip <- rbind(results.signflip, A[idx.row,])

 (... more IF statements...)

}

This is obviously a simple and not very efficient way to tackle this problem. Trouble is, the file has some 70.000 entries, and the script runs for hours.

So, my question is: does anyone have a smart idea for some really efficient code?

Upvotes: 1

Views: 103

Answers (1)

Roland
Roland

Reputation: 132989

This should get you started:

C <- merge(A, B, by = "ID")
C$switch <- sign(C$Obs.x * C$Obs.y)
aggregate(switch ~ ID, C[C$switch != 0 | is.na(C$switch),], head, n = 1, na.action = identity)
#  ID switch
#1  1      1
#2  2     NA
#3  3     -1

Some specifics may still have to be adjusted, but they make the question too broad for my taste and the general idea of merging should help you enough to move forward.

Upvotes: 3

Related Questions