Reputation: 1499
I have a data frame in R with 10,000 columns and roughly 4,000 rows. The data are IDs. For example the IDs look like (rs100987, rs1803920, etc). Each rsID# has a corresponding iHS score between 0-3. I have a separate data frame where all the possible rs#'s in existence are in one column and their corresponding iHS scores are in the next column. I want to replace my 10,000 by 4,000 data frame with rsIDs to a 10,000 by 4,000 data frame with the corresponding iHS scores. How do I do this?
This is what my file looks like now:
input ID match 1 match 2 match 3 ......
rs6708 rs10089 rs100098 rs10567
rs8902 rs18079 rs234058 rs123098
rs9076 rs77890 rs445067 rs105023
This is what my iHS score file looks like (it has matching scores for every ID in the above file
snpID iHS
rs6708 1.23
rs105023 0.92
rs234058 2.31
rs77890 0.31
I would like my output to look like
match 1 match 2 match 3
0.89 0.34 2.45
1.18 2.31 0.67
0.31 1.54 0.92
Upvotes: 0
Views: 90
Reputation: 44309
Let's consider a small example:
(dat <- data.frame(id1 = c("rs100987", "rs1803920"), id2=c("rs123", "rs456"), stringsAsFactors=FALSE))
# id1 id2
# 1 rs100987 rs123
# 2 rs1803920 rs456
(dat2 <- data.frame(id=c("rs123", "rs456", "rs100987", "rs1803920", "rs123456"),
score=5:1, stringsAsFactors=FALSE))
# id score
# 1 rs123 5
# 2 rs456 4
# 3 rs100987 3
# 4 rs1803920 2
# 5 rs123456 1
Then you can do this operation with:
apply(dat, 2, function(x) dat2$score[match(x, dat2$id)])
# id1 id2
# [1,] 3 5
# [2,] 2 4
The call to match
figures out the row in dat2
corresponding to each id in your column.
Upvotes: 1