Reputation: 873
I am using merge function on two data frames A and B
nrow(A) <- 11537
nrow(B) <- 734
But when I apply merge function as follows:
m <- merge(A,B,all.x=TRUE,by="id")
nrow(m) <- 29730
I get "m" with 29730 rows. "m" should have 11537 rows only as I am merging B into A. I am not able to identify reasons behind this. Can somebody please help me? What is getting added in "A"?
File is big, I cannot check manually.
Upvotes: 1
Views: 257
Reputation: 3462
If your id values aren't unique in each data.frame, then every combination of possible matches is created in the result. for example:
a = data.frame(id=c(1,1,1,2,2),val=1:5)
b = data.frame(id=c(1,1,3,2,2),valb=11:15)
m = merge(a,b,by="id",all.x=T)
m will have 10 rows - 6 with id=1 and 4 with id=2
My guess is this what causes your merged data.frame to become bigger than expected.
Upvotes: 2