Reputation: 21212
x <- structure(list(SU_BIRTH_DATE = structure(c(47482, 2884, 11347,
10449, -1280, 2324), class = "Date")), row.names = c(NA, 6L), class = "data.frame", .Names = "SU_BIRTH_DATE")
x
SU_BIRTH_DATE
1 2100-01-01
2 1977-11-24
3 2001-01-25
4 1998-08-11
5 1966-07-01
6 1976-05-13
From looking over a dataset it's clear that many people have used a typo for date of birth, where they entered 2100 instead of 2001 for the year part.
I want to replace any 2100 year parts of a date field with 2001.
How can I do that?
x <- x %>%
mutate(SU_BIRTH_DATE = if_else(year(SU_BIRTH_DATE) == 2100, year(SU_BIRTH_DATE) = 2001,SU_BIRTH_DATE))
Error: unexpected '=' in: "x <- x %>% mutate(SU_BIRTH_DATE = if_else(year(SU_BIRTH_DATE) == 2100, year(SU_BIRTH_DATE) ="
EDIT Converting to character and using str_replace and then converting back to date is a solution that has worked meantime, but I'm sure there's a smarter, less code way of doing this?
x <- x %>% mutate(SU_BIRTH_DATE = str_replace_all(as.character(SU_BIRTH_DATE), "2100", "2001"),
SU_BIRTH_DATE = ymd(SU_BIRTH_DATE))
Upvotes: 2
Views: 1231
Reputation: 20095
One option is to use year
function form lubridate
to check and then assign back year. I have used a custom function to explain replace. This allow not to break dplyr
chain and no conversion to character
.
One can avoid use of custom
function by use of case_when
.
Option #1
replace_year <- function(x){
for(i in seq_along(x))
if(year(x[i]) == 2100){
year(x[i]) <- 2001
}
x
}
x %>% mutate(SU_BIRTH_DATE = replace_year(SU_BIRTH_DATE))
Option #2: one can avoid use of custom function with use of case_when
x %>% mutate(SU_BIRTH_DATE = case_when(
year(SU_BIRTH_DATE) == 2100 ~ `year<-`(SU_BIRTH_DATE, 2001),
TRUE ~ SU_BIRTH_DATE
))
# SU_BIRTH_DATE
# 1 2001-01-01
# 2 1977-11-24
# 3 2001-01-25
# 4 1998-08-11
# 5 1966-07-01
# 6 1976-05-13
Upvotes: 1
Reputation: 7610
Sometimes you have to get out of a dplyr pipe. If you want to use lubridate::year
to assign a new year (a reasonable desire), it won't operate nicely inside the pipe. Do this instead:
with(x, year(SU_BIRTH_DATE[SU_BIRTH_DATE == 2100]) <- 2001)
x
SU_BIRTH_DATE
1 2001-01-01
2 1977-11-24
3 2001-01-25
4 1998-08-11
5 1966-07-01
6 1976-05-13
Upvotes: 1
Reputation: 50678
You could use gsub
to replace "2100"
with "2001"
, then cast the result as.Date
.
x %>% mutate(SU_BIRTH_DATE = as.Date(gsub("2100", "2001", SU_BIRTH_DATE)));
# SU_BIRTH_DATE
#1 2001-01-01
#2 1977-11-24
#3 2001-01-25
#4 1998-08-11
#5 1966-07-01
#6 1976-05-13
I admit, this is similar to your str_replace_all
approach, albeit a bit shorter.
Upvotes: 2