hbabbar
hbabbar

Reputation: 967

Merging two dataframes with different keys of different data types in R

I have two dataframes

master

P_ID      C_ID   D_ID ...
520686814 1222   1281
336892632 1232   8181 ...

and

str(master$P_ID) : integer64 [1:12000] 520686814 336892632 140101133 85433815 339389222

Also

Dimension

p_id      year  gender state
520686814 2017  1      A
336892632 2016  2      B

and

str(Dimension$p_id) : num [1:9915433] 520686814 336892632 636774575 248552775...

So essentially I can see that there are records matching in both dataframes for a p_id. So I do a join using

joined <- merge(master, Dimension, by.x = "P_ID", by.y = "p_id")

And to my surprise, I get no records in joined. Does the join don't work on different data types even though they work while subsetting?

Upvotes: 1

Views: 464

Answers (1)

cparmstrong
cparmstrong

Reputation: 819

Assuming you're not performing any mathematical operations on them you could (some would argue should) store IDs as character strings instead. It's my understanding that this is a safer approach. Additionally, merging on strings is faster than merging on numerics.

master$P_ID <- as.character(master$P_ID)
Dimension$p_id <- as.character(Dimension$p_id)
joined <- merge(master, Dimension, by.x = "P_ID", by.y = "p_id")

The code above should get it done. It doesn't answer the question of why your merge fails but does provide you with a workaround.

Upvotes: 1

Related Questions