Reputation: 2251
I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.
df1:
ID AGE SEX
90901 39 0
90902 28 0
90903 40 1
df2:
ID AGE SEX Conc
90901 NA NA 5
90901 NA NA 10
90901 NA NA 15
90903 NA NA 30
90903 NA NA 5
90902 NA NA 2.45
90902 NA NA 51
90902 NA NA 1
70905 NA NA 0.5
result:
df3:
ID AGE SEX Conc
90901 39 0 5
90901 39 0 10
90901 39 0 15
90903 40 1 30
90903 40 1 5
90902 28 1 2.45
90902 28 0 51
90902 28 0 1
70905 NA NA 0.5
Upvotes: 7
Views: 20295
Reputation: 3902
Here is a dplyr(v1.0.0)
approach, which only overwrites NA values in df2
:
rows_patch(df2 %>%mutate(across(AGE:SEX,as.integer)),df1)
ID AGE SEX Conc
1 90901 39 0 5.00
2 90901 39 0 10.00
3 90901 39 0 15.00
4 90903 40 1 30.00
5 90903 40 1 5.00
6 90902 28 0 2.45
7 90902 28 0 51.00
8 90902 28 0 1.00
9 70905 NA NA 0.50
Upvotes: 0
Reputation: 99331
You could use match
with lapply
for this. If we iterate [[
with matching on the ID
column of each of the original data sets over a vector of names, we can get the desired result.
nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
# ID AGE SEX Conc
# 1 90901 39 0 5.00
# 2 90901 39 0 10.00
# 3 90901 39 0 15.00
# 4 90903 40 1 30.00
# 5 90903 40 1 5.00
# 6 90902 28 0 2.45
# 7 90902 28 0 51.00
# 8 90902 28 0 1.00
# 9 70905 NA NA 0.50
Note that this is also quite a bit faster than merge
.
Upvotes: 12
Reputation: 615
Try merge(df1, df2, by = "id")
. This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.
df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")
If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:
df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)
You can learn more about merge
(or any r function) by typing ?merge()
in your r console.
Upvotes: 8