AOE_player
AOE_player

Reputation: 556

need help coming up with a regex solution in R

I have several dates that I needed to convert. However I can't get the regex, but here is my alternate attempt that works, but it is unnecessarily long.

library(stringr)
#date string vector, only two scenerios can be present
date <- c("3rd of June 18:09","22nd of January 22:19")

# substring will remove the string portion. I did not go
# with regex for this is because I am not that greate with it.

all_date_corrected <- c()

for(i in date){

  if(nchar(stringr::word(i, 1))>=4){
    x<- gsub(substr(i, start= 3, stop=7), "", i)

    all_date_corrected <- c(all_date_corrected,
                            format(strptime(x,"%d %B %H:%M",tz="GMT"),
                                   format="%m-%d %H:%M"))
  }
  else{
    x<- gsub(substr(i, start= 2, stop=6), "", i)

    all_date_corrected <- c(all_date_corrected,
                            format(strptime(x,"%d %B %H:%M",tz="GMT"),
                                   format="%m-%d %H:%M"))

  }

}

print(all_date_corrected) #[1] "06-03 18:09" "01-22 22:19"

I am pretty sure I can get rid of substr & if- statement with gsub. Here is my attempt with that.

gsub("([0-9]+).*?([A-Z])", "\\1", date[1]) #[1] "3une 18:09"

gsub("([0-9]+).*?([A-Z])", "\\1", date[2]) #[1] "22anuary 22:19"

As you see, my pattern is keep eating up the letter and doesn't put space either. Would appreciate if somebody can help out. Thanks.

Upvotes: 1

Views: 39

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173858

You could try this. It captures the day, month and time in three capturing groups and returns a string which is amenable to strptime:

strptime(gsub("^(\\d+)\\w+ of (\\w+) (.*)$", "\\1 \\2 \\3", date), "%d %B %H:%M")
#> [1] "2020-06-03 18:09:00 BST" "2020-01-22 22:19:00 GMT"

Explanation

  • ^(\\d+) Captures the leading digits from the string
  • \\w+ of Matches but doesn't capture the th of or rd of
  • (\\w+) Captures the full month name
  • (.$)$ Captures everything after the final space

The "\\1 \\2 \\3" means replace each string with the three capturing groups seperated by spaces, e.g. "03 June 18:09". We can then capture this with strptime using %d for day, %B for month and %H:%M for the time.

Upvotes: 2

Related Questions