Amer
Amer

Reputation: 2251

filling in columns with matching IDs from two dataframes in R

I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.

df1:
ID    AGE   SEX
90901   39  0
90902   28  0
90903   40  1

df2:
ID     AGE  SEX  Conc
90901   NA  NA    5
90901   NA  NA    10
90901   NA  NA    15
90903   NA  NA    30
90903   NA  NA    5
90902   NA  NA    2.45
90902   NA  NA    51
90902   NA  NA    1
70905   NA  NA    0.5

result:
df3:
ID     AGE  SEX  Conc
90901   39  0     5
90901   39  0     10
90901   39  0     15
90903   40  1    30
90903   40  1    5
90902   28  1    2.45
90902   28  0    51
90902   28  0     1
70905   NA  NA    0.5

Upvotes: 7

Views: 20295

Answers (3)

one
one

Reputation: 3902

Here is a dplyr(v1.0.0) approach, which only overwrites NA values in df2:

rows_patch(df2 %>%mutate(across(AGE:SEX,as.integer)),df1)

     ID AGE SEX  Conc
1 90901  39   0  5.00
2 90901  39   0 10.00
3 90901  39   0 15.00
4 90903  40   1 30.00
5 90903  40   1  5.00
6 90902  28   0  2.45
7 90902  28   0 51.00
8 90902  28   0  1.00
9 70905  NA  NA  0.50

Upvotes: 0

Rich Scriven
Rich Scriven

Reputation: 99331

You could use match with lapply for this. If we iterate [[ with matching on the ID column of each of the original data sets over a vector of names, we can get the desired result.

nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
#      ID AGE SEX  Conc
# 1 90901  39   0  5.00
# 2 90901  39   0 10.00
# 3 90901  39   0 15.00
# 4 90903  40   1 30.00
# 5 90903  40   1  5.00
# 6 90902  28   0  2.45
# 7 90902  28   0 51.00
# 8 90902  28   0  1.00
# 9 70905  NA  NA  0.50

Note that this is also quite a bit faster than merge.

Upvotes: 12

jed
jed

Reputation: 615

Try merge(df1, df2, by = "id"). This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.

df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")

If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:

df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)

You can learn more about merge (or any r function) by typing ?merge() in your r console.

Upvotes: 8

Related Questions