Reputation: 385
I have the following data.table:
dt <- fread("
PERIOD | EI_1 | EI_2 | EI_3 | EO_3 | GROUP
0 | 1 | 1.5 | 1.75 | | A
1 | | 1.4 | | | A
2 | | 1.3 | | | A
3 | | 1.2 | | | A
4 | | 1.1 | | | A
0 | 0 | 0.5 | 0.75 | | B
1 | | 0.4 | | | B
2 | | 0.3 | | | B
3 | | 0.2 | | | B
4 | | 0.1 | | | B
",
sep = "|",
colClasses = c("EO_3" = "numeric"))
I want to do some lag-dependent calculations, defined by the following function:
calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
ifelse(
PERIOD == 0,
EI_3,
ifelse(
PERIOD <= 2,
shift(EI_2, type="lag"),
ifelse(
EI_1[1] == 1,
0.2 * shift(EI_2, type="lag"),
20 * shift(EI_2, type="lag")
)
)
)
}
Which should return the following DT:
dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
But, instead, I get the following one:
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA NA A
5: 4 NA 1.1 NA NA A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA NA B
10: 4 NA 0.1 NA NA B
The problem is that the function doesn't just check EI_1[1] == 1
, it also makes the calculations to take place in the subset filtered by that condition.
How could I make the function to check a condition on the first row of a group, and then perform the calculations along that whole group based on the condition?
Upvotes: 1
Views: 152
Reputation: 66819
Similar to @chinsoon's "alternatively..." answer:
dt[, `:=`(
EO_3 = shift(EI_2, fill=first(EI_3)),
mult = 2*10 ^ if (first(EI_1) == 1) -1 else 1
), by=.(GROUP)]
dt[PERIOD > 2, EO_3 := EO_3 * mult ]
dt[, mult := NULL]
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
Upvotes: 2
Reputation: 25225
You can use rep(EI_1[1L]==1, .N)
to fix your code:
calc_EO_3 <- function(PERIOD, EI_1, EI_2, EI_3){
ifelse(
PERIOD == 0,
EI_3,
ifelse(
PERIOD <= 2,
shift(EI_2, type="lag"),
ifelse(
rep(EI_1[1]==1, .N), #this is the change
0.2 * shift(EI_2, type="lag"),
20 * shift(EI_2, type="lag")
)
)
)
}
dt[, EO_3 := calc_EO_3(PERIOD, EI_1, EI_2, EI_3), by = GROUP][]
output:
PERIOD EI_1 EI_2 EI_3 EO_3 GROUP
1: 0 1 1.5 1.75 1.75 A
2: 1 NA 1.4 NA 1.50 A
3: 2 NA 1.3 NA 1.40 A
4: 3 NA 1.2 NA 0.26 A
5: 4 NA 1.1 NA 0.24 A
6: 0 0 0.5 0.75 0.75 B
7: 1 NA 0.4 NA 0.50 B
8: 2 NA 0.3 NA 0.40 B
9: 3 NA 0.2 NA 6.00 B
10: 4 NA 0.1 NA 4.00 B
Alternatively,
dt[, EO_3 := 20 * shift(EI_2), by=.(GROUP)][
GROUP %in% dt[EI_1==1L, GROUP], EO_3 := 0.2 * shift(EI_2), by=.(GROUP)][
PERIOD <= 2L, EO_3 := shift(EI_2, fill=EI_3[1L]), by=.(GROUP)]
Note that there is a fifelse
under development in rdatatable github repo.
Upvotes: 2
Reputation: 1418
you can do it using basic R ifelse condition . this will give you required output
library(dplyr)
df <-as.data.frame(dt)
df$EO_3 <- ifelse(df$PERIOD == 0, df$EI_3,ifelse(df$PERIOD <= 2 & df$PERIOD > 0 ,lag(df$EI_2,1),ifelse(df$EI_1 == 1 | df$PERIOD > 2,0.2*lag(df$EI_2,1),20*lag(df$EI_2,1))))
Upvotes: 0