Reputation: 38760
I have a "master" dataframe that has the following columns:
userid, condition
Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:
userid, condition, answer1, answer2
Now, I'd like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.
master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D"))
cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2"))
cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4"))
cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6"))
cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))
How do I merge all conditions into the master, so the master table looks like follows?
userid condition answer1 answer2
1 bar A 1 2
2 bar B 3 4
3 bar C 5 6
4 bar D 7 8
5 foo A 1 2
6 foo B 3 4
7 foo C 5 6
8 foo D 7 8
I've tried the following:
temp = merge(master, cond_a, all.x=TRUE)
Which gives me:
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
But as soon as I do this…
merge(temp, cond_b, all.x=TRUE)
There are no values for condition B
. How come?
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
Upvotes: 9
Views: 28780
Reputation: 7784
You can express this join as a SQL
statement, and then use the sqldf
library to execute it.
cond_all = rbind(cond_a, cond_b, cond_c, cond_d)
> sqldf('select p.userid as userid, p.condition as condition, answer1, answer2 from master as p join cond_all as q on p.userid=q.userid and p.condition=q.condition order by userid, condition')
userid condition answer1 answer2
1 bar A 1 2
2 bar B 3 4
3 bar C 5 6
4 bar D 7 8
5 foo A 1 2
6 foo B 3 4
7 foo C 5 6
8 foo D 7 8
You mentioned in a comment that the master dataframe has extra columns that do not exist in the cond dataframes. You should be able to modify this SQL query to still work for this case.
Upvotes: 1
Reputation: 2651
As stated by the OP, given that no explicit relationship with the master data frame, an option is this:
temp <-rbind(cond_a,cond_b,cond_c,cond_d)
temp[order(temp["userid"]),]
Perhaps if any relationship was known, there could be a non-simplistic solution.
Upvotes: 2
Reputation: 193677
You can use Reduce()
and complete.cases()
as follows:
merged <- Reduce(function(x, y) merge(x, y, all=TRUE),
list(master, cond_a, cond_b, cond_c, cond_d))
merged[complete.cases(merged), ]
# userid condition answer1 answer2
# 1 bar A 1 2
# 2 bar B 3 4
# 4 bar C 5 6
# 6 bar D 7 8
# 8 foo A 1 2
# 9 foo B 3 4
# 11 foo C 5 6
# 13 foo D 7 8
Reduce()
might take some getting accustomed to. You define your function, and then provide a list
of objects to repeatedly apply the function to. Thus, that statement is like doing:
temp1 <- merge(master, cond_a, all=TRUE)
temp2 <- merge(temp1, cond_b, all=TRUE)
temp3 <- merge(temp2, ....)
Or something like:
merge(merge(merge(master, cond_a, all=TRUE), cond_b, all=TRUE), cond_c, all=TRUE)
complete.cases()
creates a logical vector of whether the specified columns are "complete" or not; this logical vector can be used to subset from the merged data.frame
.
Upvotes: 13