Evan
Evan

Reputation: 1499

Replacing a string with a matched number in a column in R

I have a data frame in R with 10,000 columns and roughly 4,000 rows. The data are IDs. For example the IDs look like (rs100987, rs1803920, etc). Each rsID# has a corresponding iHS score between 0-3. I have a separate data frame where all the possible rs#'s in existence are in one column and their corresponding iHS scores are in the next column. I want to replace my 10,000 by 4,000 data frame with rsIDs to a 10,000 by 4,000 data frame with the corresponding iHS scores. How do I do this?

This is what my file looks like now:

input ID     match 1    match 2     match 3 ......
rs6708       rs10089   rs100098    rs10567
rs8902       rs18079   rs234058    rs123098
rs9076       rs77890   rs445067    rs105023

This is what my iHS score file looks like (it has matching scores for every ID in the above file

snpID     iHS
rs6708    1.23
rs105023   0.92
rs234058  2.31
rs77890   0.31

I would like my output to look like 

match 1   match 2   match 3
0.89      0.34      2.45
1.18      2.31      0.67
0.31      1.54      0.92

Upvotes: 0

Views: 90

Answers (1)

josliber
josliber

Reputation: 44309

Let's consider a small example:

(dat <- data.frame(id1 = c("rs100987", "rs1803920"), id2=c("rs123", "rs456"), stringsAsFactors=FALSE))
#         id1   id2
# 1  rs100987 rs123
# 2 rs1803920 rs456
(dat2 <- data.frame(id=c("rs123", "rs456", "rs100987", "rs1803920", "rs123456"),
                   score=5:1, stringsAsFactors=FALSE))
#          id score
# 1     rs123     5
# 2     rs456     4
# 3  rs100987     3
# 4 rs1803920     2
# 5  rs123456     1

Then you can do this operation with:

apply(dat, 2, function(x) dat2$score[match(x, dat2$id)])
#      id1 id2
# [1,]   3   5
# [2,]   2   4

The call to match figures out the row in dat2 corresponding to each id in your column.

Upvotes: 1

Related Questions