Reputation: 985
I have a table with three columns ABC
, EFG
, HIJ
. I would like to create a fourth column KLM
which is a function of the conditional value of ABC, and a operation result on EFG
and HIJ
.
For now I am using a loop that takes about 15 minutes over 400,000 rows. And that does not seem very R to me. There must be a way to do this significantly less time:
for (i in 1:nrow(df)){
if(is.na(df$ABC[i]) == FALSE ){
df$KLM[i] <- as.numeric(df$EFG[i] * df$HIJ[i])
} else {
df$KLM[i] = NaN
}
}
I have added the df:
ABC = c("NaN", 232,234,233,232.5)
EFG = c(12,12,12,12,12)
HIJ = c(10.75, 10.95, 11.25, 10.85, 10.55)
KLM = c(0,0,0,0,0)
df <- as.data.frame(cbind(ABC, EFG, HIJ, KLM))
df < unfactor(df)
> df
ABC EFG HIJ KLM
1 NaN 12 10.75 0
2 232 12 10.95 0
3 234 12 11.25 0
4 233 12 10.85 0
5 232.5 12 10.55 0
Does anyone knows how to simplify and make more efficient please ?
Upvotes: 1
Views: 38
Reputation: 3116
@jogo's solution mentioned in the comments is the best vectorized solution for data.frame.
Using data.table
it can be optimized as follows:
dt = as.data.table(df)
dt[,`:=`(KLM=NaN)]
set(x = dt, i =which(!is.na(dt$ABC)),j="KLM",value = as.numeric(EFG * HIJ))
Upvotes: 1