Reputation: 1520
I have a string of the form
stamp = "section_d1_2010-07-01_08_00.txt"
and would like to be able to extract parts of this. I have been able to do this by using repeated str_extract to get to the section I want, e.g. to grab the month
month = str_extract(stamp,"2010.+")
month = str_extract(month,"-..")
month = str_extract(month,"..$")
however this is terribly inefficient and there has to be a better way. For this particular example I can use
month = substr(stamp,17,18)
however am looking for something more versatile (in case the number of digits changes).
I think I need the regular expression to grab what comes AFTER certain flags (the _ or -, or the 3rd _ etc.). I have tried using sub as well, but had the same problem in that I was needing several to hone into what I actually wanted.
An example of how to get say the month (07 here) and the hour (08 here) would be appreciated.
Upvotes: 4
Views: 11006
Reputation: 887851
You can try
gsub('^.*_\\d+-|-\\d+_.*$', '', stamp)
#[1] "07"
For the hour
library(stringr)
str_extract(stamp, '(?<=\\d_)\\d+(?=_\\d)')
#[1] "08"
Extracting both
str_extract_all(stamp, '(?<=\\d{4}[^0-9])\\d{2}|\\d{2}(?=[^0-9]\\d{2}\\.)')[[1]]
#[1] "07" "08"
Upvotes: 2
Reputation: 67988
You can simply use strsplit
with regex [-_]
and perl=TRUE
option to get all the parts.
stamp <- "section_d1_2010-07-01_08_00.txt"
strsplit(stamp, '[-_]')[[1]]
# [1] "section" "d1" "2010" "07" "01" "08" "00.txt"
See demo.
https://regex101.com/r/cK4iV0/8
Upvotes: 4