Reputation: 1
I am using merge() function in R. I want to merge two dataframes. However, I am interested in the observations that don't merge. Is there a way to produce output that shows the observations that did not successfully merge?
Example: DataframeA=1138888 obs DataframeB=1010651 obs I want to see which obs were not included in DataframeB that were included in DataFrameA.
I have read through the merge() documentation and saw all.x, all.y commands. However, I really want a new dataframe that represents those observations that would NOT merge.
Thank you an advance for any help.
Upvotes: 0
Views: 762
Reputation: 21497
Dont know how computationally effectiv the following solution is but it works
with subset()
df.a<-data.frame(key=sample(1:100,50),"a")
df.b<-data.frame(key=sample(1:100,50),"b")
db.bNOTa<-subset(df.b,!(df.b$key %in% df.a$key))
Edit:
Another solution using match
would be:
db.bNOTa<-merge(df.b,df.a,all.x=TRUE)
db.bNOTa<-subset(db.bNOTa,!is.na(db.bNOTa[,3]))
In MySQL this would be a Left Join like Pointed out here: SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value
Upvotes: 1