nhern121
nhern121

Reputation: 3919

What am I doing wrong (data.table, R)?

I'm currently doing some tests with the set function in the data.table package in R and have the following code:

  dt= data.table(ans=rep(c(14,16),100))
  dt[,voy:=0.0]
  set(dt,which(dt[,ans]==14),"voy",log(dt[,ans]))
  dt

Note that I want to compute the logarithm of those cases having ans=14 using the set function, but I'm not getting the correct result. This is the result I got:

  ans      voy
  1:  14 2.639057
  2:  16 0.000000
  3:  14 2.772589
  4:  16 0.000000
  5:  14 2.639057
  ---             
  196:  16 0.000000
  197:  14 2.639057
  198:  16 0.000000
  199:  14 2.772589
  200:  16 0.000000

You may note that for some rows the value of the variable voy is the expected log(14)=2.639057 but for others cases having ans=14 it is assigned 2.772589=log(16). So, I think I'm misusing the set function. How can I solve this? I know the next code can be used to carry this out:

dt[ans==14,voy:=log(ans)]

But I want to translate this into the set function syntax.

Upvotes: 2

Views: 2308

Answers (1)

ROLO
ROLO

Reputation: 4223

You need to subset the data for the value parameter. In your case, the warning Supplied 200 items to be assigned to 100 items of column 'voy' (100 unused) could have given you an idea. You were picking one by one the first 100 values of dt$ans, which indeed are alternating 14's and 16's.

This way it works:

set(dt,which(dt[,ans]==14),"voy",log(dt[ans==14,ans]))

giving:

     ans      voy
  1:  14 2.639057
  2:  16 0.000000
  3:  14 2.639057
  4:  16 0.000000
  5:  14 2.639057
 ---             
196:  16 0.000000
197:  14 2.639057
198:  16 0.000000
199:  14 2.639057
200:  16 0.000000

But it's ugly code, as @Andrie already remarked.

Upvotes: 4

Related Questions