colin
colin

Reputation: 2666

Conditionally replace a value in a data frame with a value from a second data frame

Say I have a data frame, d1, that looks like this:

  site code trait
1    1    A   1.0
2    2    B   1.3
3    3    A    NA
4    4    B   2.9
5    5    A    NA

Here is the dput to generate d1:

structure(list(site = 1:5, code = structure(c(1L, 2L, 1L, 2L, 
1L), .Label = c("A", "B"), class = "factor"), trait = c(1, 1.3, 
NA, 2.9, NA)), .Names = c("site", "code", "trait"), row.names = c(NA, 
-5L), class = "data.frame")

I have a second data frame, d2, that looks like this:

  code trait
1    A   1.5
2    B   2.5

Here is the dput to generate d2:

structure(list(code = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    trait = c(1.5, 2.5)), .Names = c("code", "trait"), row.names = c(NA, 
-2L), class = "data.frame")

I would like a piece of code that replaces the NA values of trait with the trait value from d2 that matches the code character for a particular row in d1. The final output of d1 would look like this:

  site code trait
1    1    A   1.0
2    2    B   1.3
3    3    A   1.5
4    4    B   2.9
5    5    A   1.5

Things I've tried:

d1$trait<- ifelse(is.na(d1$trait),d2$trait[d2$code == d1$code],d1$trait)

When using this code I'm getting a warning:

Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In ==.default(d2$code, d1$code) : longer object length is not a multiple of shorter object length

Upvotes: 3

Views: 296

Answers (3)

boshek
boshek

Reputation: 4406

You could also accomplish this without a intermediate object using a dplyr like pipe:

library(dplyr)

full_join(df1, df2, by="code") %>%
  mutate(trait=ifelse(is.na(trait.x), trait.y, trait.x)) %>%
  select(site, code, trait)

The advantage here is that you don't need an intermediate object period and you are ready to start working with the data in the pipe.

Upvotes: 1

josliber
josliber

Reputation: 44320

Your ifelse syntax is close, but the problematic bit is:

d2$trait[d2$code == d1$code]

Here, you are trying to look up the d2$trait value corresponding to the correct code value from d1, but you are actually just comparing the corresponding elements of d2$code to d1$code. The operation can instead be accomplished with match:

d1$trait<- ifelse(is.na(d1$trait),d2$trait[match(d1$code, d2$code)], d1$trait)
d1
#   site code trait
# 1    1    A   1.0
# 2    2    B   1.3
# 3    3    A   1.5
# 4    4    B   2.9
# 5    5    A   1.5

An alternative would be to just replace the missing values, again using match to grab the relevant elements from d2$trait:

d1$trait[is.na(d1$trait)] <- d2$trait[match(d1$code[is.na(d1$trait)], d2$code)]
d1
#   site code trait
# 1    1    A   1.0
# 2    2    B   1.3
# 3    3    A   1.5
# 4    4    B   2.9
# 5    5    A   1.5

While match and merge are internally doing very similar things, I find the match syntax to be a bit easier to use because you don't need to create an intermediate object via merge and then grab the relevant information from that intermediate object.

Upvotes: 3

jogo
jogo

Reputation: 12559

It is a simple task for merge:

df12 <- merge(df1, df2, by="code", all.x=TRUE)
df12$trait <- ifelse(is.na(df12$trait.x), df12$trait.y, df12$trait.x)

Upvotes: 2

Related Questions