Reputation: 39
filelist <- c(
"http://content.caiso.com/green/renewrpt/20171015_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171016_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171017_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171018_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171019_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171020_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171021_DailyRenewablesWatch.txt",
"http://content.caiso.com/green/renewrpt/20171022_DailyRenewablesWatch.txt"
)
I am looking to extract the string between the 5th occurrence of /
and _
Ex: From "http://content.caiso.com/green/renewrpt/20171015_DailyRenewablesWatch.txt"
I would want 20171015
.
I have tried
regmatches(filelist, regexpr("/{4}([^_]+)", filelist))
but it returns empty.
Upvotes: 1
Views: 2175
Reputation: 660
There is a function to get rid of the url first:
filelist <- basename(filelist)
Then try removing all after "_" using str_remove
from the stringr
package:
library(stringr)
str_remove(filelist, "_.*")
Output:
[1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021" "20171022"
Check the lubridate
package's ymd
function in case you would like to turn this into a date.
Upvotes: 0
Reputation: 269471
Here are a few approaches which use regular expressions:
sub(".*(\\d{8}).*", "\\1", filelist)
sub(".*/", "", sub("_.*", "", filelist))
sub("_.*", "", basename(filelist))
sapply(strsplit(filelist, "[/_]"), "[", 6)
gsub("\\D", "", filelist)
m <- gregexpr("\\d{8}", filelist)
unlist(regmatches(filelist, m))
strcapture("(\\d{8})", filelist, data.frame(character()))[[1]]
library(gsubfn)
strapplyc(filelist, "\\d{8}", simplify = TRUE)
These solutions do not use regular expressions at all:
substring(filelist, 41, 48)
substring(basename(filelist), 1, 8)
read.table(text = filelist, comment.char = "_", sep = "/")[[6]]
as.Date(basename(filelist), "%Y%m%d") # returns Date class object
Update: Added a few more approaches.
Upvotes: 1
Reputation: 32548
substr(x = filelist,
start = sapply(gregexpr(pattern = "/", filelist), function(x) x[5])+1,
stop = sapply(gregexpr(pattern = "_", filelist), function(x) x[1])-1)
#[1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021"
#[8] "20171022"
Upvotes: 0
Reputation: 206187
This should work
gsub("(?:.*/){4}([^_]+)_.*", "\\1", filelist)
# [1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021"
# [8] "20171022"
We need to also match the stuff in front of each of the slashed in the capture.
Upvotes: 4