Reputation: 2666
Say I have a data frame, d1
, that looks like this:
site code trait
1 1 A 1.0
2 2 B 1.3
3 3 A NA
4 4 B 2.9
5 5 A NA
Here is the dput to generate d1
:
structure(list(site = 1:5, code = structure(c(1L, 2L, 1L, 2L,
1L), .Label = c("A", "B"), class = "factor"), trait = c(1, 1.3,
NA, 2.9, NA)), .Names = c("site", "code", "trait"), row.names = c(NA,
-5L), class = "data.frame")
I have a second data frame, d2
, that looks like this:
code trait
1 A 1.5
2 B 2.5
Here is the dput to generate d2
:
structure(list(code = structure(1:2, .Label = c("A", "B"), class = "factor"),
trait = c(1.5, 2.5)), .Names = c("code", "trait"), row.names = c(NA,
-2L), class = "data.frame")
I would like a piece of code that replaces the NA values of trait
with the trait value from d2
that matches the code
character for a particular row in d1
. The final output of d1
would look like this:
site code trait
1 1 A 1.0
2 2 B 1.3
3 3 A 1.5
4 4 B 2.9
5 5 A 1.5
Things I've tried:
d1$trait<- ifelse(is.na(d1$trait),d2$trait[d2$code == d1$code],d1$trait)
When using this code I'm getting a warning:
Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In ==.default(d2$code, d1$code) : longer object length is not a multiple of shorter object length
Upvotes: 3
Views: 296
Reputation: 4406
You could also accomplish this without a intermediate object using a dplyr
like pipe:
library(dplyr)
full_join(df1, df2, by="code") %>%
mutate(trait=ifelse(is.na(trait.x), trait.y, trait.x)) %>%
select(site, code, trait)
The advantage here is that you don't need an intermediate object period and you are ready to start working with the data in the pipe.
Upvotes: 1
Reputation: 44320
Your ifelse
syntax is close, but the problematic bit is:
d2$trait[d2$code == d1$code]
Here, you are trying to look up the d2$trait
value corresponding to the correct code
value from d1
, but you are actually just comparing the corresponding elements of d2$code
to d1$code
. The operation can instead be accomplished with match
:
d1$trait<- ifelse(is.na(d1$trait),d2$trait[match(d1$code, d2$code)], d1$trait)
d1
# site code trait
# 1 1 A 1.0
# 2 2 B 1.3
# 3 3 A 1.5
# 4 4 B 2.9
# 5 5 A 1.5
An alternative would be to just replace the missing values, again using match
to grab the relevant elements from d2$trait
:
d1$trait[is.na(d1$trait)] <- d2$trait[match(d1$code[is.na(d1$trait)], d2$code)]
d1
# site code trait
# 1 1 A 1.0
# 2 2 B 1.3
# 3 3 A 1.5
# 4 4 B 2.9
# 5 5 A 1.5
While match
and merge
are internally doing very similar things, I find the match
syntax to be a bit easier to use because you don't need to create an intermediate object via merge
and then grab the relevant information from that intermediate object.
Upvotes: 3
Reputation: 12559
It is a simple task for merge:
df12 <- merge(df1, df2, by="code", all.x=TRUE)
df12$trait <- ifelse(is.na(df12$trait.x), df12$trait.y, df12$trait.x)
Upvotes: 2