order
order

Reputation: 375

Mapping rows from one matrix to another. Shared unique ids

I have three tables. One which contains the results. The other two contain notes about unique ids that appear in the results. I will try to explain the format a little bit.

RESULTS

UniqueID1 UniqueID2 [Columns containing the results from some experiment.]

NOTES1

UniqueID [Notes about this particular uniqued ID]

NOTES2

same as above.

So, I want every row in the results to contain the notes from the NOTES table. I now accomplish this iteratively; this process now takes longer than the experiment itself. I have considered using subsetting or lapply perhaps, but can't seem to find something fitting.

Some things to consider.

Regarding results: The two UniqueID columns in results are disjoint. They are not sets; they contain repetition. However, each pair of unique IDs occurs exactly one time in the results. table.

Regarding NOTES: The unique id column does not contain repetition.

The details I have given above are part of why I haven't just lazily subsetted the NOTES table and called cbind on results and the subset of NOTES.

I believe I have devoted a decent bit of thought to this; please don't downvote just because the answer is trivial to you. I haven't included code, because I want to know how accomplish this for the general case in R without iteration. If you think that it will help, I could include some code.

Sample Tables:

Results

"UniqueID_1"    "UniqueID_2"    "P_val" "Corr_Coef"
"M200000001"    "M200000113"    "0.530536974203384" "-0.115081242888051"
"M200000001"    "M200000113"    "0.530536974203384" "-0.115081242888051"
"M200000001"    "M200000114"    "0.252990706016934" "0.399292132865147"
"M200000001"    "M200000115"    "0.368094246871692" "0.16456647390621"
"M200000001"    "M200000116"    "0.78457345841227"  "-0.0502948318806314"
"M200000001"    "M200000117"    "0.874022710686625" "-0.0291832590390869"
"M200000001"    "M200000121"    "0.199903265024637" "0.232729988792225"
"M200000001"    "M200000122"    "0.892630363051574" "-0.024845579436259"
"M200000001"    "M200000126"    "0.27862153489629"  "-0.208127935743872"

NOTES

M200000002  22352   Vim
M200000004  20810   Srm
M200000005  17878   Myf6
M200000006  27061   Bcap31
M200000007  54204   1-Sep
M200000008  71995   Erv3
M200000009  15384   Hnrpab
M200000010  14453   Gas2
M200000011  12054   Bcl7b
M200000012  26572   Cops3
M200000014  17896   Myl4
M200000016  52469   Ccdc56
M200000017  19720   Trim27
M200000019  11434   Acr

Upvotes: 1

Views: 515

Answers (2)

seancarmody
seancarmody

Reputation: 6290

merge(results, notes, by="UniqueID_1", all.x=TRUE)

or, for the case of two notes:

merge(merge(results, notes1, by="UniqueID_1", all.x=TRUE),
      notes1, by="UniqueID_2", all.x=TRUE)

You haven't told us the names of your NAMES columns, so if the ID column in each is actually UniqueID then you'd need to do something like this:

merge(merge(results, notes1, by.x="UniqueID_1", by.y="UniqueID", all.x=TRUE),
      notes1, by.x="UniqueID_2", by.y="UniqueID", all.x=TRUE)

The notes from notes1 and notes2 will be in different columns.

Upvotes: 3

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37784

It still isn't clear to me why subset and cbind aren't working for you; it would seem something like this would work fine.

 cbind(Results, Notes1[Results$UniqueID_1,], Notes2[Results$UniqueID_2,])

This would require the rownames of the Notes tables to be the ID; perhaps that's the issue?

 rownames(Notes1) <- Notes1$UniqueID
 rownames(Notes2) <- Notes2$UniqueID

match is also useful in cases like this; it's what merge is using behind the scenes.

Upvotes: 0

Related Questions