desired login
desired login

Reputation: 1190

Lookup table for new column in dataframe in R

I have a dataframe, dat, where one of the columns, dat$QC, contains quality codes, as an integer. I want to add a new column, QS, containing the string describing the quality code for each row.

Here's what I have tried: I have stored the quality codes in a vector, qcIDs, and the strings in another vector, qcStrings. Then I loop over these and populate the new column in the dataframe accordingly. like this:

qcIDs <- c(1,2,3)
qcStrings <- c('foo', 'bar', 'baz')
for (ii in 1:length(qcIDs)) {
    dat$QS[dat$QC == qcIDs[ii]] <- qcStrings[ii]
}

I'm new to R and have read often there are better ways of solving problems than with for loops. Is there a more R-ish way of approaching this? Does the above look as clumsy as it feels to me (feels pretty clumsy)? Thanks.

Upvotes: 3

Views: 772

Answers (3)

Matthew Lundberg
Matthew Lundberg

Reputation: 42669

Since it's most useful to end up with a factor in the data frame, simply create the factor using the given parameters. Here's an example:

(dat <- data.frame(QC=rep(c(1,2,3), 2)) )
##   QC
## 1  1
## 2  2
## 3  3
## 4  1
## 5  2
## 6  3

Your parameters for the factor creation:

qcIDs <- c(1,2,3)
qcStrings <- c('foo', 'bar', 'baz')

Use these to encode a factor in dat:

dat$QC <- factor(dat$QC, levels=qcIDs, labels=qcStrings)
dat
##    QC
## 1 foo
## 2 bar
## 3 baz
## 4 foo
## 5 bar
## 6 baz

I didn't time this, but it is going to be faster than any sort of merge. There are no data comparisons going on here, just a reclass of the object.

Upvotes: 5

topchef
topchef

Reputation: 19793

Solution using merge:

lookupQ = data.frame(qcID=c(1,2,3), QS=c('foo', 'bar', 'baz'))
mergedDat = merge(dat, lookupQ, by.x="QC", by.y="qcID")

Upvotes: 1

marbel
marbel

Reputation: 7714

Using data.table package

require("data.table")
lkp <- data.table(qcIDs = 1:3, qcStrings = c('foo', 'bar', 'baz'))
dat <- data.table(QC = rep(1:3, 10e6))
setkey(dat,QC)
setkey(lkp,qcIDs)

result <- lkp[dat]

print(result)

#          qcIDs qcStrings
#        1:     1       foo
#        2:     1       foo
#        3:     1       foo
#        4:     1       foo
#        5:     1       foo
#       ---                
# 29999996:     3       baz
# 29999997:     3       baz
# 29999998:     3       baz
# 29999999:     3       baz
# 30000000:     3       baz


system.time(lkp[dat])
# user  system elapsed 
# 0.63    0.07    0.70 

Upvotes: 1

Related Questions