Reputation: 10604
Update: My apologies... I should have specified that there are missing values for some tests (come conducted n times, others conducted m times), and I'm thinking that's going to affect the answer based on the merge
documentation:
all.x
logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.
I don't want either of these options: namely, I don't want only matching row/column values kept, nor do I want non-matching row/column intersection values filled in with NAs.
I have data from an experiment and stored my chemical compositions in one file and testing in another. I'd like to merge these so I can plot boxplots faceted by composition factor level, but am having a hard time figuring out how to do it. Here's some example data (adjusted per the above):
set.seed(1)
df.ids <- data.frame(lot=c("A","B","C"),
var1=c(1,2,3),
var2=c(4,5,6))
df.testing <- data.frame(lot=c(rep("A",4), rep("B",4),rep("C",4)),
test1=rep(c(runif(3,min=1,max=5),NA),3),
test2=rep(c(runif(2,min=1,max=5),NA,NA),3),
test3=rep(runif(4,min=1,max=5),3))
Like my real data, the ids data frame has one row per factor level, but the test data has multiple for each test replicate. How might I merge these so that I fill in the compositions for each of the lots? In other words, my resultant data frame would look like this:
| lot | test1 | test2 | test3 | var1 | var2 |
|-----+----------+----------+----------+------+------|
| A | 2.062035 | 4.632831 | 4.593559 | 1 | 4 |
| A | 2.488496 | 1.806728 | 4.778701 | 1 | 4 |
| A | 3.291413 | NA | 3.643191 | 1 | 4 |
| A | NA | NA | 3.516456 | 1 | 4 |
| B | 2.062035 | 4.632831 | 4.593559 | 2 | 5 |
| B | 2.488496 | 1.806728 | 4.778701 | 2 | 5 |
| B | 3.291413 | NA | 3.643191 | 2 | 5 |
| B | NA | NA | 3.516456 | 2 | 5 |
| C | 2.062035 | 4.632831 | 4.593559 | 3 | 6 |
| C | 2.488496 | 1.806728 | 4.778701 | 3 | 6 |
| C | 3.291413 | NA | 3.643191 | 3 | 6 |
| C | NA | NA | 3.516456 | 3 | 6 |
In the end, I suppose all I'm trying to do is match unique lot IDs and then just replicate the values from df.ids
for each of it's columns into df.testing
. I just don't know how (well, I thought of doing it with a for()
loop, but I'm sure there's a better way)!
Upvotes: 1
Views: 7915
Reputation: 263481
You can just rearrange the order of the columns:
> merge(df.ids, df.testing)
lot var1 var2 test1 test2
1 A 1 4 3.389578 11.55608
2 A 1 4 9.502077 15.47929
3 A 1 4 9.173870 17.18321
4 B 2 5 4.349115 12.85377
5 B 2 5 2.815137 14.45693
6 B 2 5 6.947180 17.45857
7 C 3 6 9.085507 17.92857
8 C 3 6 6.155680 12.58901
9 C 3 6 6.662026 19.92715
> merge(df.ids, df.testing)[c(1,4:5,2:3)]
lot test1 test2 var1 var2
1 A 3.389578 11.55608 1 4
2 A 9.502077 15.47929 1 4
3 A 9.173870 17.18321 1 4
4 B 4.349115 12.85377 2 5
5 B 2.815137 14.45693 2 5
6 B 6.947180 17.45857 2 5
7 C 9.085507 17.92857 3 6
8 C 6.155680 12.58901 3 6
9 C 6.662026 19.92715 3 6
This turns out to be an inner-join because there are no missing items in the merge-by columns, but the by.x and by.y parameters can be chosen so you get a left-outer or a right-outer join if needed. If for some reason you wanted the order of the items in df.testing to be maintained, you would need to create a column that got carried along, which would then be order()ed-by.
Upvotes: 2