Merge function generates duplicates

Question

This question is just to understand why this would happen.

I'm merging two databases:

bot.rep.geo <- merge(x = bot.rep, y = geo.2016, by = "cod.geo", all.x = TRUE)

The original databases have the following dimensions: bot.rep has 1634451 observations, geo.2016 has 1393.

After merging using all.x = TRUE, the new database emerges with 1727681, instead of the same size as bot.rep.

Why does this happen?

After a quick review, I realised it was creating some duplicates, but I don't understand the reason and if I'm doing something wrong while using the merge function.

user1923975 · Accepted Answer

There may be lines in the geo.2016 table where the cod.geo value appears twice or more.

if you have a bot.rep value of "X" in your bot.rep data, then 2 lines which contain "X" in the geo.2016 data, the merge will duplicate the line in bot.rep and join the 2 lines from geo.2016.

Merge function generates duplicates

Answers (2)

Related Questions