Reputation: 23
I've been reading a lot about lag or accumulate but still can't figure out how do work on mine. I've worked out a simplified data of mine like below and also my attempt of workaround of this
library(tidyverse)
library(lubridate)
currDt = ymd(20211130)
polyData = structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712, 8895, 18498, 18690, 18689, 18688, 18323), class = "Date"),
CRDATE = structure(c(19080, 18898, 18896, 18895, 18895, 18862, 19054, 19053, 19052, 18687), class = "Date"),
mREGNO = c("BBB", "BBB", "BBB", "BBB", "BBB", "BBB", "DDD", "DDD", "DDD", "DDD"),
SCHDRNUM = c(7711858, 7710223, 7709517, 7709318, 7709172, 7487762, 7686649, 7684567, 7683420, 7374917),
MCCDATE = structure(c(18686, 18686, 18685, 18684, 18864, 18467, 18662, 18661, 18660, 18294), class = "Date")),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
# sort accordingly for row processing next step
# by REGNO and latest CHDRNUM
polyData = polyData %>%
arrange(mREGNO, -SCHDRNUM)
n = polyData %>%
group_by(mREGNO) %>%
tally() %>%
ungroup() %>%
summarise(n = max(n))
n = n$n
polyData_backUp = polyData
# get carried backward CCDATE minus 1 month
# first round is temporary create columns for later use
polyData = polyData %>%
mutate(# temporary Keep for latest tran of every mREGNO
Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
TRUE ~ 0),
# temporary Carried_MCCDATE for latest and second latest tran of every mREGNO
Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
TRUE ~ currDt)) # temporary as at date
# keep calculate base on the updated info for every rows
for (i in 2:n) {
polyData = polyData %>%
mutate(Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
CCDATE <= Carried_MCCDATE ~ 1, # previous CCDATE must have at least one month gap
TRUE ~ 0),
Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
TRUE ~ lag(Carried_MCCDATE))) # if latest tran is not ok then will carry from carried
}
Basically columns from CCDATE to MCCDATE are the original data sorted by mREGNO and SCHDRNUM (Descending); Columns from Carried_MCCDATE to keep are the ideal result I would want.
The criteria is as below and I've worked it out in excel and have try lag in R but it doesn't work. The reason is because I need to take care of recursive formula at Carried_MCCDATE and another depending column together.
For this sample with this small, I manage to do it with a loop but is there anyway I can achieve this by using tidyverse? Because my actual data is a lot bigger.
The Carried_MCCDATE formula would be:
1) If first row of different mREGNO then = MCCDATE
2) If not first row and previous row's keep = 1 then = previous row's MCCDATE
3) Else = previous row's Carried_MCCDATE
Excel formula for cell F1 =IF(C2<>C1,E2,IF(G1=1,E1,F1))
The Keep formula would be:
1) If first row of different mREGNO then = 1
2) If CCDATE <= Carried_MCCDATE then = 1
3) Else = 0
Excel formula for cell G1 =IF(C2<>C1,1,IF(A2<=F2,1,0))
Please help and thank you very much in advance!!
Upvotes: 1
Views: 196
Reputation: 269491
Assuming that the input is polyData_backUp
We can use accumulate
from purrr.
library(dplyr)
library(lubridate)
library(purrr)
polyData_backUp %>%
group_by(mREGNO) %>%
mutate(carried = as_date(accumulate(2:n(), .init = first(MCCDATE),
function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))) %>%
ungroup
giving:
# A tibble: 10 x 6
CCDATE CRDATE mREGNO SCHDRNUM MCCDATE carried
<date> <date> <chr> <dbl> <date> <date>
1 2021-03-30 2022-03-29 BBB 7711858 2021-02-28 2021-02-28
2 2021-03-29 2021-09-28 BBB 7710223 2021-02-28 2021-02-28
3 2021-03-27 2021-09-26 BBB 7709517 2021-02-27 2021-02-28
4 2021-03-26 2021-09-25 BBB 7709318 2021-02-26 2021-02-28
5 1994-05-10 2021-09-25 BBB 7709172 2021-08-25 2021-02-28
6 2020-08-24 2021-08-23 BBB 7487762 2020-07-24 2021-08-25
7 2021-03-04 2022-03-03 DDD 7686649 2021-02-04 2021-02-04
8 2021-03-03 2022-03-02 DDD 7684567 2021-02-03 2021-02-04
9 2021-03-02 2022-03-01 DDD 7683420 2021-02-02 2021-02-04
10 2020-03-02 2021-03-01 DDD 7374917 2020-02-02 2021-02-04
Upvotes: 1