maldini425
maldini425

Reputation: 317

Creating a Categorical Y variable Using Dates

I am using an administrative dataset for a welfare program that provides a wage subsidy for workers. And I am trying to create a Y variable, whereby 1 equals a person who no longer receives a subsidy, and 0 equals a person currently receiving a subsidy, where end_date=NA. I will be doing this using two variables: 1-start_date and 2-end_date.

I have tried the following code, but I am getting an error message:

train_worker_subsidy5_categorical_y = train_worker_subsidy5 %>% 
  mutate(left_welfare = numeric(is.na(end_date)))
test_worker_subsidy5_categorical_y = test_worker_subsidy5 %>%
  mutate(left_welfare = numeric(is.na(end_date)))

The error message is:

Error in numeric(is.na(end_date)) : invalid 'length' argument

Upvotes: 0

Views: 31

Answers (1)

Patrick25
Patrick25

Reputation: 121

If I am understanding your question I would use this approach.

df <- data.frame('start_date' = as.Date(c('2018-01-01','2019-02-01',
                                          '2019-03-01','2019-04-01')),
                  'end_date' = as.Date(c('2019-01-01',NA,'2019-08-01',
                                   '2020-01-01')))

 today <- Sys.Date()

df %>% mutate('receiving' = if_else(is.na(df$end_date),0,
                              if_else(df$end_date > today,0,1)))

       start_date   end_date      receiving
      1 2018-01-01   2019-01-01         1
      2 2019-02-01        <NA>          0
      3 2019-03-01   2019-08-01         1
      4 2019-04-01   2020-01-01         0

It is hard to fully understand the question with out any reproducible code. Hope this helps.

Upvotes: 1

Related Questions