sneaky_lobster
sneaky_lobster

Reputation: 385

How to do ifelse calculations based on first row by group in R data.table

I have the following data.table:

dt <- fread("
        PERIOD | EI_1 | EI_2 | EI_3 | EO_3 | GROUP
           0   |  1   |  1.5 | 1.75 |      |   A  
           1   |      |  1.4 |      |      |   A
           2   |      |  1.3 |      |      |   A
           3   |      |  1.2 |      |      |   A
           4   |      |  1.1 |      |      |   A
           0   |  0   |  0.5 | 0.75 |      |   B
           1   |      |  0.4 |      |      |   B
           2   |      |  0.3 |      |      |   B
           3   |      |  0.2 |      |      |   B  
           4   |      |  0.1 |      |      |   B
        ", 
        sep = "|",
        colClasses = c("EO_3" = "numeric"))

I want to do some lag-dependent calculations, defined by the following function:

calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
  ifelse(
    PERIOD == 0,
    EI_3,
    ifelse(
      PERIOD <= 2,
      shift(EI_2, type="lag"),
      ifelse(
        EI_1[1] == 1,
        0.2 * shift(EI_2, type="lag"),
        20 * shift(EI_2, type="lag")
      )
    )
  )
}

Which should return the following DT:

dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]



 PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

But, instead, I get the following one:

 PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA   NA     A
 5:      4   NA  1.1   NA   NA     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA   NA     B
10:      4   NA  0.1   NA   NA     B

The problem is that the function doesn't just check EI_1[1] == 1, it also makes the calculations to take place in the subset filtered by that condition.

How could I make the function to check a condition on the first row of a group, and then perform the calculations along that whole group based on the condition?

Upvotes: 1

Views: 152

Answers (3)

Frank
Frank

Reputation: 66819

Similar to @chinsoon's "alternatively..." answer:

dt[, `:=`(
  EO_3 = shift(EI_2, fill=first(EI_3)),
  mult = 2*10 ^ if (first(EI_1) == 1) -1 else 1
), by=.(GROUP)]

dt[PERIOD > 2, EO_3 := EO_3 * mult ]
dt[, mult := NULL]

    PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

Upvotes: 2

chinsoon12
chinsoon12

Reputation: 25225

You can use rep(EI_1[1L]==1, .N) to fix your code:

calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
    ifelse(
        PERIOD == 0,
        EI_3,
        ifelse(
            PERIOD <= 2,
            shift(EI_2, type="lag"),
            ifelse(
                rep(EI_1[1]==1, .N),   #this is the change
                0.2 * shift(EI_2, type="lag"),
                20 * shift(EI_2, type="lag")
            )
        )
    )
}

dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]

output:

    PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
 1:      0    1  1.5 1.75 1.75     A
 2:      1   NA  1.4   NA 1.50     A
 3:      2   NA  1.3   NA 1.40     A
 4:      3   NA  1.2   NA 0.26     A
 5:      4   NA  1.1   NA 0.24     A
 6:      0    0  0.5 0.75 0.75     B
 7:      1   NA  0.4   NA 0.50     B
 8:      2   NA  0.3   NA 0.40     B
 9:      3   NA  0.2   NA 6.00     B
10:      4   NA  0.1   NA 4.00     B

Alternatively,

dt[, EO_3 := 20 * shift(EI_2), by=.(GROUP)][
    GROUP %in% dt[EI_1==1L, GROUP], EO_3 := 0.2 * shift(EI_2), by=.(GROUP)][
        PERIOD <= 2L, EO_3 := shift(EI_2, fill=EI_3[1L]), by=.(GROUP)]

Note that there is a fifelse under development in rdatatable github repo.

Upvotes: 2

Hunaidkhan
Hunaidkhan

Reputation: 1418

you can do it using basic R ifelse condition . this will give you required output

library(dplyr)
df <-as.data.frame(dt)


df$EO_3 <- ifelse(df$PERIOD == 0, df$EI_3,ifelse(df$PERIOD <= 2 & df$PERIOD > 0 ,lag(df$EI_2,1),ifelse(df$EI_1 == 1 | df$PERIOD > 2,0.2*lag(df$EI_2,1),20*lag(df$EI_2,1))))

Upvotes: 0

Related Questions