Reputation: 556
I have several dates that I needed to convert. However I can't get the regex, but here is my alternate attempt that works, but it is unnecessarily long.
library(stringr)
#date string vector, only two scenerios can be present
date <- c("3rd of June 18:09","22nd of January 22:19")
# substring will remove the string portion. I did not go
# with regex for this is because I am not that greate with it.
all_date_corrected <- c()
for(i in date){
if(nchar(stringr::word(i, 1))>=4){
x<- gsub(substr(i, start= 3, stop=7), "", i)
all_date_corrected <- c(all_date_corrected,
format(strptime(x,"%d %B %H:%M",tz="GMT"),
format="%m-%d %H:%M"))
}
else{
x<- gsub(substr(i, start= 2, stop=6), "", i)
all_date_corrected <- c(all_date_corrected,
format(strptime(x,"%d %B %H:%M",tz="GMT"),
format="%m-%d %H:%M"))
}
}
print(all_date_corrected) #[1] "06-03 18:09" "01-22 22:19"
I am pretty sure I can get rid of substr
& if- statement
with gsub
. Here is my attempt with that.
gsub("([0-9]+).*?([A-Z])", "\\1", date[1]) #[1] "3une 18:09"
gsub("([0-9]+).*?([A-Z])", "\\1", date[2]) #[1] "22anuary 22:19"
As you see, my pattern is keep eating up the letter and doesn't put space either. Would appreciate if somebody can help out. Thanks.
Upvotes: 1
Views: 39
Reputation: 173858
You could try this. It captures the day, month and time in three capturing groups and returns a string which is amenable to strptime
:
strptime(gsub("^(\\d+)\\w+ of (\\w+) (.*)$", "\\1 \\2 \\3", date), "%d %B %H:%M")
#> [1] "2020-06-03 18:09:00 BST" "2020-01-22 22:19:00 GMT"
Explanation
^(\\d+)
Captures the leading digits from the string\\w+ of
Matches but doesn't capture the th of
or rd of
(\\w+)
Captures the full month name(.$)$
Captures everything after the final spaceThe "\\1 \\2 \\3"
means replace each string with the three capturing groups seperated by spaces, e.g. "03 June 18:09"
. We can then capture this with strptime
using %d
for day, %B
for month and %H:%M
for the time.
Upvotes: 2