Ayush Raj Singh
Ayush Raj Singh

Reputation: 873

Bug when I use "merge()" in R

I am using merge function on two data frames A and B

nrow(A) <- 11537
nrow(B) <- 734

But when I apply merge function as follows:

m <- merge(A,B,all.x=TRUE,by="id")

nrow(m) <- 29730

I get "m" with 29730 rows. "m" should have 11537 rows only as I am merging B into A. I am not able to identify reasons behind this. Can somebody please help me? What is getting added in "A"?

File is big, I cannot check manually.

Upvotes: 1

Views: 257

Answers (1)

amit
amit

Reputation: 3462

If your id values aren't unique in each data.frame, then every combination of possible matches is created in the result. for example:

a = data.frame(id=c(1,1,1,2,2),val=1:5)
b = data.frame(id=c(1,1,3,2,2),valb=11:15)
m = merge(a,b,by="id",all.x=T)

m will have 10 rows - 6 with id=1 and 4 with id=2

My guess is this what causes your merged data.frame to become bigger than expected.

Upvotes: 2

Related Questions