Cyan
Cyan

Reputation: 339

How to use for loop and if conditions together in data.table?

I have a data.table dt in R like this:

        ukey      time    orderType      a       edge        h1        h2
 1: 10600877   93003000       Buy       14.12      100      0.002      0.0002    
 2: 10600188   93003000       Buy       24.29      100      0.003      0.0003 
 3: 10600995   93003000       Buy       17.23      100      0.003      0.0003 
 4: 10600262   93003000      Sell       18.89      100      0.002      0.0004 
 5: 10600399   93003000      Sell       18.66      100      0.004      0.0002 
 6: 10603260   93003000      Sell       110.39     100      0.003      0.0002 
 7: 11000004   93003000      Sell       22.98      100      0.002      0.0003 
 ...

I want to get a new column called check by some conditions. My sample code looks like this:

for (i in 1:nrow(dt)) {
    if (dt[i, a - 1.5 * h1 > 111]) {
        dt[, check:= "A"]
        break
    }
    else if (dt[i, a+ edge - 1.3 * h1- 0.3 * h2> 111]) {
        dt[, check:= "B"]
        break
    }
    else if (dt[i, a- 1.5 * h1> 80]) {
        dt[, check:= "C"]
        break
    }
    else {dt[, check:= "D"]}
}

However, my code doesn't work well and it is inefficient. Could you give me some better solutions?Many thanks!

Upvotes: 1

Views: 263

Answers (2)

Sweepy Dodo
Sweepy Dodo

Reputation: 1873

What you are looking for is a faster ifelse equivalent

subset method

library(data.table)

setDT(dt)

# initialise
dt[, check := "D"]

# faster ifelse
dt[a- 1.5 * h1> 80, check := "C"
   ][a+ edge - 1.3 * h1- 0.3 * h2> 111, check := "B"
     ][a - 1.5 * h1 > 111, check := "A"
       ]

Notice the latter subset conditions take precedence. Thus, orders are reversed i.e. C, B, A. Anything not captured by our conditions are left with initialised value i.e. D

Similar post & performance gain vs dplyr

data.table's fcase

# courtesy to SO user @langtang
df[, check := fcase(a - 1.5 * h1 > 111, 'A'
                    , a + edge - 1.3 * h1 - 0.3 * h2 > 111, 'B'
                    , a - 1.5 * h1 > 80, 'C'
                    , default = 'D'
                    )]

notice with this method earlier conditions take precedence. Thus, order is A, B, C.

Upvotes: 2

Denny Chen
Denny Chen

Reputation: 499

You can try to use case_when in dplyr package.

dt[, check := case_when((a - 1.5 * h1) > 111 ~ "A",
                        (a+ edge - 1.3 * h1- 0.3 * h2)> 111 ~ "B",
                        (a- 1.5 * h1) > 80 ~ "C",
                        TRUE ~ "D")]

Result looks like :

> dt
       ukey     time orderType      a edge    h1    h2 check
1: 10600877 93003000       Buy  14.12  100 0.002 2e-04     B
2: 10600188 93003000       Buy  24.29  100 0.003 3e-04     B
3: 10600995 93003000       Buy  17.23  100 0.003 3e-04     B
4: 10600262 93003000      Sell  18.89  100 0.002 4e-04     B
5: 10600399 93003000      Sell  18.66  100 0.004 2e-04     B
6: 10603260 93003000      Sell 110.39  100 0.003 2e-04     B
7: 11000004 93003000      Sell  22.98  100 0.002 3e-04     B

Hope it is useful.

Upvotes: 2

Related Questions