Reputation: 1190
I have a dataframe, dat
, where one of the columns, dat$QC
, contains quality codes, as an integer. I want to add a new column, QS
, containing the string describing the quality code for each row.
Here's what I have tried: I have stored the quality codes in a vector, qcIDs
, and the strings in another vector, qcStrings
. Then I loop over these and populate the new column in the dataframe accordingly. like this:
qcIDs <- c(1,2,3)
qcStrings <- c('foo', 'bar', 'baz')
for (ii in 1:length(qcIDs)) {
dat$QS[dat$QC == qcIDs[ii]] <- qcStrings[ii]
}
I'm new to R and have read often there are better ways of solving problems than with for
loops. Is there a more R-ish way of approaching this? Does the above look as clumsy as it feels to me (feels pretty clumsy)? Thanks.
Upvotes: 3
Views: 772
Reputation: 42669
Since it's most useful to end up with a factor in the data frame, simply create the factor using the given parameters. Here's an example:
(dat <- data.frame(QC=rep(c(1,2,3), 2)) )
## QC
## 1 1
## 2 2
## 3 3
## 4 1
## 5 2
## 6 3
Your parameters for the factor creation:
qcIDs <- c(1,2,3)
qcStrings <- c('foo', 'bar', 'baz')
Use these to encode a factor in dat
:
dat$QC <- factor(dat$QC, levels=qcIDs, labels=qcStrings)
dat
## QC
## 1 foo
## 2 bar
## 3 baz
## 4 foo
## 5 bar
## 6 baz
I didn't time this, but it is going to be faster than any sort of merge. There are no data comparisons going on here, just a reclass of the object.
Upvotes: 5
Reputation: 19793
Solution using merge:
lookupQ = data.frame(qcID=c(1,2,3), QS=c('foo', 'bar', 'baz'))
mergedDat = merge(dat, lookupQ, by.x="QC", by.y="qcID")
Upvotes: 1
Reputation: 7714
Using data.table
package
require("data.table")
lkp <- data.table(qcIDs = 1:3, qcStrings = c('foo', 'bar', 'baz'))
dat <- data.table(QC = rep(1:3, 10e6))
setkey(dat,QC)
setkey(lkp,qcIDs)
result <- lkp[dat]
print(result)
# qcIDs qcStrings
# 1: 1 foo
# 2: 1 foo
# 3: 1 foo
# 4: 1 foo
# 5: 1 foo
# ---
# 29999996: 3 baz
# 29999997: 3 baz
# 29999998: 3 baz
# 29999999: 3 baz
# 30000000: 3 baz
system.time(lkp[dat])
# user system elapsed
# 0.63 0.07 0.70
Upvotes: 1