Reputation: 29
This may seem trivial but I am really stuck at this problem of comparing a value with this complex string
My data frame looks like this:
Id | History | Report Month |
---|---|---|
1001 | Jun:2020,030/XXX-May:2020,035/XXX-Apr:2020,040/XXX-Mar:2020,060/XXX | July 2021 |
1003 | Jun:2017,823/XXX-May:2017,000/XXX-Apr:2017,000/XXX-Mar:2017,000/XXX | July 2021 |
1005 | Apr:2019,000/XXX-Mar:2019,800/XXX-Feb:2019,000/XXX-Jan:2019,000/XXX | July 2021 |
1006 | Jun:2020,000/XXX-May:2020,030/XXX-Apr:2020,060/XXX-Mar:2020,090/XXX | July 2021 |
Key, value pair from the column history that will be used in comparison are as following:
Id : 1001 - Jun 2020,030 May 2020, 035 Apr 2020, 040......
Id : 1003 - Jun 2017,823 May 2017, 000 Apr 2017, 000......
Problem statement is: I want to compare these key, value pair with the report month (i.e. always current month) and make a conditional column based on it. Logic is: 24 months (could be 12 or 36) preceding July 2021 i.e July 2021-Jun 2019, how many key,value pairs have value >= 30 or >= 60 etc for months that lie within this time period. So if a string starts from <Jun 2019, like for 1003, the answer should be 0.
Output
Id | Report Month | +30_last_24 | +30_last_36 |
---|---|---|---|
1001 | July 2021 | 4 | 4 |
1003 | July 2021 | 0 | 0 |
1005 | July 2021 | 0 | 1 |
1006 | July 2021 | 3 | 3 |
I started with R very recently and have no solution to even begin with, so any help would be deeply appreciated.
MODIFIED ORIGINAL DATASET
df <- read.table(header = T, text = "Id History ReportMonth
1001 Jun:2020,030/XXX|May:2020,035/XXX|Apr:2020,040/XXX|Mar:2020,060/XXX 'July 2021'
1003 Jun:2017,DDD/XXX|May:2017,030/XXX|Apr:2017,DDD/STD|Mar:2017,000/XXX 'July 2021'
1005 Apr:2019,000/XXX|Mar:2019,800/DDD|Feb:2019,000/XXX|Jan:2019,000/XXX 'July 2021'
1006 Jun:2020,000/XXX|May:2020,030/XXX|Apr:2020,060/XXX|Mar:2020,090/XXX 'July 2021'")
Upvotes: 1
Views: 137
Reputation: 26238
Revised Strategy in view of modifications-
|
but only after escaping it with\\
,
gsub
Feel free to ask clarifications, if any.
df <- read.table(header = T, text = "Id History ReportMonth
1001 Jun:2020,030/XXX|May:2020,035/XXX|Apr:2020,040/XXX|Mar:2020,060/XXX 'July 2021'
1003 Jun:2017,DDD/XXX|May:2017,030/XXX|Apr:2017,DDD/STD|Mar:2017,000/XXX 'July 2021'
1005 Apr:2019,000/XXX|Mar:2019,800/DDD|Feb:2019,000/XXX|Jan:2019,000/XXX 'July 2021'
1006 Jun:2020,000/XXX|May:2020,030/XXX|Apr:2020,060/XXX|Mar:2020,090/XXX 'July 2021'")
library(tidyverse)
library(lubridate, warn.conflicts = F)
df %>%
separate_rows(History, sep = '\\|') %>%
separate(History, into = c('Hist_mon', 'Hist_val'), sep = ',') %>%
mutate(Hist_mon = dmy(paste0('1:', Hist_mon)),
Hist_val = as.numeric(gsub('(\\D*)', '', Hist_val)),
ReportMonth = dmy(paste0('1 ', ReportMonth))) %>%
group_by(Id, ReportMonth) %>%
summarise(last_24_30 = sum(Hist_val >= 30 & Hist_mon >= ReportMonth %m-% months(24)),
last_36_30 = sum(Hist_val >= 30 & Hist_mon >= ReportMonth %m-% months(36)), .groups = 'drop')
#> # A tibble: 4 x 4
#> Id ReportMonth last_24_30 last_36_30
#> <int> <date> <int> <int>
#> 1 1001 2021-07-01 4 4
#> 2 1003 2021-07-01 0 0
#> 3 1005 2021-07-01 0 1
#> 4 1006 2021-07-01 3 3
Created on 2021-07-16 by the reprex package (v2.0.0)
Upvotes: 2
Reputation: 79358
library(tidyverse)
library(lubridate)
df %>%
separate_rows(History, sep = '[|]')%>%
filter(str_detect(History, "\\w"), str_detect(History, "\\d+/"))%>%
separate(History, c("Date", "Value", "d"), sep = '[,/]', convert = TRUE) %>%
mutate(across(c(Date,ReportMonth), ~myd(paste(.x, "01")))) %>%
group_by(Id) %>%
summarise(r = list(map(c(m24 = 24, m36 = 36), ~sum(
Date + months(.x) > ReportMonth & Value >= 30)))) %>%
unnest_wider(r) %>%
right_join(df, 'Id')
# A tibble: 4 x 5
Id m24 m36 History_Report Month
<int> <int> <int> <chr> <chr>
1 1001 4 4 Jun:2020,030/XXX-May:2020,035/XXX-Apr:2020,040/XXX-Mar:2020,060/XXX July 2021
2 1003 0 0 Jun:2017,823/XXX-May:2017,000/XXX-Apr:2017,000/XXX-Mar:2017,000/XXX July 2021
3 1005 0 1 Apr:2019,000/XXX-Mar:2019,800/XXX-Feb:2019,000/XXX-Jan:2019,000/XXX July 2021
4 1006 3 3 Jun:2020,000/XXX-May:2020,030/XXX-Apr:2020,060/XXX-Mar:2020,090/XXX July 2021
Upvotes: 1