Reputation: 167
I am trying to merge two data frames in R according to ID-variable , since in the first data frame the ID-variable is a factor and in the second data frame the ID-variable is a numeric vector
datC <- data.frame("ID" = c("001","010","100","110"), "X" = rnorm(4))
datD <- data.frame("ID" = c(001,010,100,110,210), "Y" = rpois(5,3))
merge(datC,datD,by="ID")
ID X Y
100 0.03284493 2
110 -0.41690986 3
As we see the 001,010 are missing , and it is clear for me why they are missing and how I can solve this
datC$ID <- as.numeric(as.character(datC$ID))
merge(datC,datD,by="ID")
I have tried to find another solution by let R avoid TO treat a vector of character as a factor
datC <- data.frame("ID" = c("001","010","100","110"), "X" = rnorm(4),stringsAsFactors = F)
datD <- data.frame("ID" = c(001,010,100,110,210), "Y" = rpois(5,3))
When I merged these two data frames according to the ID-variable , I expected that I will get null results , since we have a character vs numeric vectors but in fact R gave me the same results as the ID-Variable in the first data frame is a factor .
merge(datC,datD,by="ID")
ID X Y
100 -0.2797956 4
110 -1.0397791 4
So can someone explain me why my expectation was false !
Upvotes: 1
Views: 5876
Reputation: 66819
When you merge two columns, they are coerced to a common class
:
Factors, raw vectors and lists are converted to character vectors, and then
x
andtable
are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching.(from the
?match
help page, linked from?merge
)
So your numeric
column becomes character
, as does your factor
:
as.character(c(001,010,100,110,210))
# [1] "1" "10" "100" "110" "210"
This explains the result seen in the OP. This sort of coercion is common throughout R.
Comment. For ID columns, I always use character
; it's a lot less hassle than the alternatives. sprintf
is handy for adding leading zeros, ensuring all IDs are the same character-length:
sprintf("%03d", c(001,010,100,110,210))
# [1] "001" "010" "100" "110" "210"
Upvotes: 1