Reputation: 754
I have been using dplyr
and the merge
function for a while, but I just ran into a task that I can't figure out how to do in a simple way. I would like to merge two data frames. The second data frame has some columns missing. I want the output of merge to have these values filled out with the corresponding values of the first data frame.
The data frames look like this:
#df1
participant = c(1,2,3)
age = c(18,23,19)
gender = c(rep("female",2),"male")
experiment = rep("noun",3)
rt = c(218,111,546)
df1 = data.frame(participant, age, gender,experiment,rt)
participant age gender experiment rt
1 1 18 female noun 218
2 2 23 female noun 111
3 3 19 male noun 546
#df2
participant = c(1,2,3)
experiment = rep("verb",3)
rt = c(238,251,140)
df2 = data.frame(participant, experiment,rt)
participant experiment rt
1 1 verb 238
2 2 verb 251
3 3 verb 140
This is how the output data frame should look like:
participant age gender experiment rt
1 1 18 female noun 218
2 2 23 female noun 111
3 3 19 male noun 546
4 1 18 female verb 238
5 2 23 female verb 251
6 3 19 male verb 140
The goal is that df2
is merged using the column participant, and the values for the columns age and gender are taken from df1
. I have tried many things (all the types of join
, rbind.fill
, etc) and I have also looked around in the forum, but I can't find a simple way to achieve binding the rows of the two data frames and filling out missing values in df2
using a specific column as a reference. In my example, there are only 2 columns to be filled out (age and gender), but my real df1
has many more, so I would like to avoid specifying them by hand.
If any of you data merging wizards have a suggestion, I would be very grateful!
Upvotes: 2
Views: 2381
Reputation: 887961
We place the datasets in a list
, use rbindlist
with fill=TRUE
(from data.table
), grouped by 'participant, we assign (:=
) the 'age' and 'gender' as the non-NA values in that column.
library(data.table)
rbindlist(list(df1, df2), fill=TRUE)[, c('age', 'gender') :=
lapply(.SD, na.omit) , participant, .SDcols=age:gender][]
# participant age gender experiment rt
#1: 1 18 female noun 218
#2: 2 23 female noun 111
#3: 3 19 male noun 546
#4: 1 18 female verb 238
#5: 2 23 female verb 251
#6: 3 19 male verb 140
Or using only base R
, we rbind
the subset of columns that are common in both the datasets, and then create the additional columns based on the columns that are found in one dataset and not in other.
i1 <- intersect(names(df1), names(df2))
res <- rbind(df1[i1], df2[i1])
nm1 <- setdiff(names(df1),i1)
res[nm1] <- df1[nm1]
res
# participant experiment rt age gender
#1 1 noun 218 18 female
#2 2 noun 111 23 female
#3 3 noun 546 19 male
#4 1 verb 238 18 female
#5 2 verb 251 23 female
#6 3 verb 140 19 male
Upvotes: 4