Doug Fir
Doug Fir

Reputation: 21212

Conditionally change the year part of a date within a dplyr chain

x <- structure(list(SU_BIRTH_DATE = structure(c(47482, 2884, 11347, 
10449, -1280, 2324), class = "Date")), row.names = c(NA, 6L), class = "data.frame", .Names = "SU_BIRTH_DATE")

 x
  SU_BIRTH_DATE
1    2100-01-01
2    1977-11-24
3    2001-01-25
4    1998-08-11
5    1966-07-01
6    1976-05-13

From looking over a dataset it's clear that many people have used a typo for date of birth, where they entered 2100 instead of 2001 for the year part.

I want to replace any 2100 year parts of a date field with 2001.

How can I do that?

x <- x %>% 
  mutate(SU_BIRTH_DATE = if_else(year(SU_BIRTH_DATE) == 2100, year(SU_BIRTH_DATE) = 2001,SU_BIRTH_DATE))

Error: unexpected '=' in: "x <- x %>% mutate(SU_BIRTH_DATE = if_else(year(SU_BIRTH_DATE) == 2100, year(SU_BIRTH_DATE) ="

EDIT Converting to character and using str_replace and then converting back to date is a solution that has worked meantime, but I'm sure there's a smarter, less code way of doing this?

   x <- x %>% mutate(SU_BIRTH_DATE = str_replace_all(as.character(SU_BIRTH_DATE), "2100", "2001"),
              SU_BIRTH_DATE = ymd(SU_BIRTH_DATE))

Upvotes: 2

Views: 1231

Answers (3)

MKR
MKR

Reputation: 20095

One option is to use year function form lubridate to check and then assign back year. I have used a custom function to explain replace. This allow not to break dplyr chain and no conversion to character.

One can avoid use of custom function by use of case_when.

Option #1

replace_year <- function(x){
  for(i in seq_along(x))
  if(year(x[i]) == 2100){
    year(x[i]) <- 2001
  }
  x
}


x %>% mutate(SU_BIRTH_DATE = replace_year(SU_BIRTH_DATE))

Option #2: one can avoid use of custom function with use of case_when

x %>% mutate(SU_BIRTH_DATE = case_when(
  year(SU_BIRTH_DATE) == 2100 ~ `year<-`(SU_BIRTH_DATE, 2001),
  TRUE ~ SU_BIRTH_DATE
  ))

# SU_BIRTH_DATE
# 1    2001-01-01
# 2    1977-11-24
# 3    2001-01-25
# 4    1998-08-11
# 5    1966-07-01
# 6    1976-05-13 

Upvotes: 1

De Novo
De Novo

Reputation: 7610

Sometimes you have to get out of a dplyr pipe. If you want to use lubridate::year to assign a new year (a reasonable desire), it won't operate nicely inside the pipe. Do this instead:

with(x, year(SU_BIRTH_DATE[SU_BIRTH_DATE == 2100]) <- 2001)
x
  SU_BIRTH_DATE
1    2001-01-01
2    1977-11-24
3    2001-01-25
4    1998-08-11
5    1966-07-01
6    1976-05-13

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50678

You could use gsub to replace "2100" with "2001", then cast the result as.Date.

x %>% mutate(SU_BIRTH_DATE = as.Date(gsub("2100", "2001", SU_BIRTH_DATE)));
#  SU_BIRTH_DATE
#1    2001-01-01
#2    1977-11-24
#3    2001-01-25
#4    1998-08-11
#5    1966-07-01
#6    1976-05-13

I admit, this is similar to your str_replace_all approach, albeit a bit shorter.

Upvotes: 2

Related Questions