Reputation: 1
I have a date column that contains two dates (e.g Start Date 1/1/13 to End Date 12/31/13")and some rows contain different format like (e.g.1/1/13 to 12/31/13), (October to May)
I want to unify the format to MM/DD/YYYY and separate the start and end date into two columns.
How can I get rid of the characters and separate the two dates and place them into two separate columns like the picture attached?
Can this be achieved in R?
Upvotes: 0
Views: 616
Reputation: 41210
You could use regular expressions with stringr
and lubridate
:
df <- data.frame(range = c("1/1/13 to 12/31/13",
"5/5/15 to 10/27/15"))
df$from <- lubridate::mdy(stringr::str_extract(df$range,"^.*?(?=to)"))
df$to <- lubridate::mdy(stringr::str_extract(df$range,"(?=to).*?$"))
df
#> range from to
#> 1 1/1/13 to 12/31/13 2013-01-01 2013-12-31
#> 2 5/5/15 to 10/27/15 2015-05-05 2015-10-27
Created on 2020-09-20 by the reprex package (v0.3.0)
Or without converting to date :
library(dplyr)
df <- data.frame(range = c("1/1/13 to 12/31/13",
"5/5/15 to 10/27/15",
"October to November"))
df %>% mutate(from = stringr::str_extract(range,"^.*?(?= to)"),
to = stringr::str_extract(range,"(?<=to ).*?$"))
#> range from to
#> 1 1/1/13 to 12/31/13 1/1/13 12/31/13
#> 2 5/5/15 to 10/27/15 5/5/15 10/27/15
#> 3 October to November October November
Created on 2020-09-20 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 2232
You should be able to do it with just regular expressions and back referencing Backreference in R.
dat<-data.frame(datestring=c("November to March","11/1/2001 to 12/8/2001"))
dat$from <- gsub("(.*) to (.*)","\\1",dat$datestring) # creates a new column 'from' that takes the first thing (before the 'to')
dat$to <- gsub("(.*) to (.*)","\\2",dat$datestring) # creates a new column 'to' that takes the second thing (after the 'to')
dat
datestring from to
1 November to March November March
2 11/1/2001 to 12/8/2001 11/1/2001 12/8/2001
Upvotes: 1