adrrs
adrrs

Reputation: 47

Extract part of string: date and times

I have a variable that usually has some gibberish like:

\n\t\n\t\n\t\n\t\tSeuat eselyt\n\t\t\t\t\t\n\t\t\tti 30.07.2019 klo 12:00 - 14:30\n\t\t\t\t\t\t\tTau ski 2342342 2342342\n\t\t\t\t\t\n\t\n

I am trying to extract the date (30.07.2019) and time (12:00 - 14:30). I am not very good with parsing so some help with implementing this in R would be appreciated.

Upvotes: 3

Views: 92

Answers (4)

NelsonGon
NelsonGon

Reputation: 13319

A kind of lengthy step by step base/stringr approach:

tst<-"\n\t\n\t\n\t\n\t\tSeuat eselyt\n\t\t\t\t\t\n\t\t\tti 30.07.2019 klo 12:00 - 14:30\n\t\t\t\t\t\t\tTau ski 2342342 2342342\n\t\t\t\t\t\n\t\n"
 cleaner<-gsub("\\n|\\t","",tst)
 split_txt<-strsplit(cleaner, "\\s(?=[a-z])",perl=T)
 dates<-stringr::str_extract_all(unlist(split_txt),
                         "\\d{1,}\\.\\d{2,}\\.\\d{4}")
 times<-stringr::str_extract_all(stringr::str_remove_all(unlist(split_txt),
                          "[A-Za-z]"),".*\\-.*")
 dates[lengths(dates)>0]
[[1]]
[1] "30.07.2019"

 trimws(times[lengths(times)>0])
[1] "12:00 - 14:30"

Upvotes: 1

user11116003
user11116003

Reputation:

This for date:

(\d{1,2}[\.\/]){2}((\d{4})|(\d{2}))

Here is Demo

This for time:

\d{1,2}:\d{2}\s?-\s?\d{1,2}:\d{2}

Here Is Demo

Upvotes: 1

zx8754
zx8754

Reputation: 56159

String split, then extract date and times:

x <- "\n\t\n\t\n\t\n\t\tSeuat eselyt\n\t\t\t\t\t\n\t\t\tti 30.07.2019 klo 12:00 - 14:30\n\t\t\t\t\t\t\tTau ski 2342342 2342342\n\t\t\t\t\t\n\t\n"

lapply(strsplit(x, "[\n\t ]"), function(i){
  dd <- i[ grepl("[0-9]{2}.[0-9]{2}.[0-9]{2}", i) ]
  tt <- i[ grepl("[0-9]{2}:[0-9]{2}", i) ]
  c(dd, paste(tt, collapse = "-"))
})

# [[1]]
# [1] "30.07.2019"  "12:00-14:30"

Upvotes: 1

jludewig
jludewig

Reputation: 428

If you can rely on the fact that the date and time part only occur once in your data you could use regular expressions to extract them (here using a dataframe):

library(tidyverse)
data <-
   tibble(gibberish_string = "\n\t\n\t\n\t\n\t\tSeuat eselyt\n\t\t\t\t\t\n\t\t\tti 30.07.2019 klo 12:00 - 14:30\n\t\t\t\t\t\t\tTau ski 2342342 2342342\n\t\t\t\t\t\n\t\n")

data %>% mutate(date = str_extract(gibberish_string,
                                   pattern = "\\d{1,2}\\.\\d{1,2}\\.\\d{4}"),
                time = str_extract(gibberish_string,
                                   pattern = "\\d{1,2}:\\d{1,2}"))

Upvotes: 2

Related Questions