Reputation: 2877
I wish to create a new column which contains the minimum date per group where some conditions are met.
My data looks as follows:
mbr <- c('A','A','A','A','B','B','B')
drg_typ <- c('TGT','TGT','TGT','Other','Other','TGT','TGT')
dt <- as.Date(c('2018-01-01','2019-06-30','2019-03-18','2017-01-01','2018-01-01','2016-01-01','2019-05-01'))
df <- data.frame(mbr,drg_typ,dt)
mbr drg_typ dt
A TGT 2018-01-01
A TGT 2019-06-30
A TGT 2019-03-18
A Other 2017-01-01
B Other 2018-01-01
B TGT 2016-01-01
B TGT 2019-05-01
I wish to mutate a new column called min_dt which is performed at the group level of mbr
using the following logic:
For each mbr where the drg_typ = 'TGT' and the dt is between 2019-01-01 and 2019-12-31 fill a new column called min_dt with the minimum value of dt which falls between the date values above.
I have tried:
df <- df %>%
group_by(mbr) %>%
mutate(min_dt = if_else(drg_typ == 'TGT' & dt >= '2019-01-01' & dt <= '2019-12-31', min(dt),0))
but I receive the following error:
Error in as.Date.numeric(value) : 'origin' must be supplied
I have checked the structure of my data frame and dt
is a date
> str(df)
'data.frame': 7 obs. of 3 variables:
$ mbr : Factor w/ 2 levels "A","B": 1 1 1 1 2 2 2
$ drg_typ: Factor w/ 2 levels "Other","TGT": 2 2 2 1 1 2 2
$ dt : Date, format: "2018-01-01" "2019-06-30" "2019-03-18" "2017-01-01" ...
My final output should look as follows:
mbr drg_typ dt min_dt
A TGT 2018-01-01 2019-03-18
A TGT 2019-06-30 2019-03-18
A TGT 2019-03-18 2019-03-18
A Other 2017-01-01 2019-03-18
B Other 2018-01-01 2019-05-01
B TGT 2016-01-01 2019-05-01
B TGT 2019-05-01 2019-05-01
Upvotes: 2
Views: 1431
Reputation: 887891
The 0
is an issue and if_else
checks the type. Here the true
is of type Date
class while false
is numeric. Instead it can be NA
and convert it to Date
with as.Date
library(dplyr)
df %>%
group_by(mbr) %>%
mutate(min_dt = if_else(drg_typ == 'TGT' &
dt >= '2019-01-01' & dt <= '2019-12-31', min(dt), as.Date(NA)))
Based on the expected output, we don't need if_else
here. The min
of 'dt' can be based on a logical expression
df %>%
group_by(mbr) %>%
mutate(min_dt = min(dt[drg_typ == 'TGT' &
between(dt, as.Date('2019-01-01'), as.Date('2019-12-31'))]))
# A tibble: 7 x 4
# Groups: mbr [2]
# mbr drg_typ dt min_dt
# <fct> <fct> <date> <date>
#1 A TGT 2018-01-01 2019-03-18
#2 A TGT 2019-06-30 2019-03-18
#3 A TGT 2019-03-18 2019-03-18
#4 A Other 2017-01-01 2019-03-18
#5 B Other 2018-01-01 2019-05-01
#6 B TGT 2016-01-01 2019-05-01
#7 B TGT 2019-05-01 2019-05-01
Or using data.table
library(data.table)
setDT(df)[, min_dt := min(dt[drg_typ == 'TGT' &
between(dt, as.Date('2019-01-01'), as.Date('2019-12-31'))]), mbr]
Upvotes: 3