Reputation: 409
I want to be able to search for a specific string that has this format:
"q4-2015"
"q2-2013"
"q3-2011"
from a long list of file names and break them down into two variables: quarter and year.
so example a long list of names can include:
"aaaaa-ttttt-eeee-q4-2015-file"
"aaaaaa-fffff-3333-q2-2012-file"
and the code should loop through the file names and then throw out the specific variables, in the first case,
year = 2015, quarter = q4
and in second case:
year = 2012, quarter = q2
etc
Upvotes: 0
Views: 121
Reputation: 270248
1) strcapture Using the test input shown reproducibly in the Note at the end can invoke strcapture
from base R:
pat <- "(q\\d)-(\\d{4})"
strcapture(pat, x, list(quarter = "", year = 0))
giving:
quarter year
1 q4 2015
2 q2 2012
An alternative might be to have a numeric quarter column. In that case we would use pat <- "(\\d)-(\\d{4})"
and list(quarter = 0, year = 0)
.
2) read.pattern read.pattern
in the gsubfn package could be used with the same pattern.
library(gsubfn)
read.pattern(text = x, pattern = pat, col.names = c("quarter", "year"),
as.is = TRUE)
giving:
quarter year
1 q4 2015
2 q2 2012
2a) Another approach is to use gsubfn's strapply
to produce a yearqtr
class object and then we could readily extract the quarter and year or we could just leave it as a yearmon object:
library(gsubfn)
library(zoo)
ym <- do.call("c",
strapply(x, pat, q + y ~ as.yearqtr(paste(y, q, sep = "-"))))
ym
## [1] "2015 Q4" "2012 Q2"
data.frame(quarter = paste0("q", cycle(ym), year = as.integer(ym),
stringsAsFactors = FALSE)
## quarter year
## 1 q4 2015
## 2 q2 2012
# test input
x <- c("aaaaa-ttttt-eeee-q4-2015-file",
"aaaaaa-fffff-3333-q2-2012-file")
Upvotes: 0
Reputation: 389265
We can try with this pattern
captured_words <- sub(".*\\b(q\\d)-(\\d+)\\b.*", "\\1-\\2", x)
captured_words
#[1] "q4-2015" "q2-2012"
Here, we capture two terms:
1) q
followed by a single digit number and 2) the numbers following that.
We can separate them and read them in a dataframe using read.table
read.table(text = paste0(captured_words, collapse = "\n"), sep = "-")
# V1 V2
#1 q4 2015
#2 q2 2012
data
x <- c("aaaaa-ttttt-eeee-q4-2015-file","aaaaaa-fffff-3333-q2-2012-file")
Upvotes: 1
Reputation: 522719
We can try using sub
here:
quarters <- sapply(input, function(x) {
sub(".*\\b(q\\d+)-\\d{4}\\b.*", "\\1", x)
})
years <- sapply(input, function(x) {
sub(".*\\bq\\d+-(\\d{4})\\b.*", "\\1", x)
})
df <- data.frame(quarters, years)
df
quarters years
q4-2015 q4 2015
q2-2013 q2 2013
q3-2011 q3 2011
Upvotes: 3