David Z
David Z

Reputation: 7041

Filter strings by its time content in R

Suppose I have a string vector (file names actually):

x<-c("abcd20090809.txt", "bc20100209.txt", "bcd19971109.txt",
     "abcef20120802.txt", "efg20151109.txt","xyz19860102.txt")

The numbers in x represent the time in format of yyyymmdd. What I wanted is to filter the x for the files's time before year 2000. e.g. an output would be:

> xx
[1] "bcd19971109.txt" "xyz19860102.txt"

Upvotes: 0

Views: 34

Answers (2)

markus
markus

Reputation: 26343

You can use grep

grep(pattern = "^[a-z]+1", x, value = TRUE)
# [1] "bcd19971109.txt" "xyz19860102.txt"

edit

If we want to subset by the condition 'before 2010' we might do

thres <- as.Date("2010-01-01")
idx <- as.Date(unlist(regmatches(x, gregexpr("\\d+", text = x), )), format = "%Y%m%d") < thres
x[idx]
# [1] "abcd20090809.txt" "bcd19971109.txt"  "xyz19860102.txt" 

Upvotes: 1

Dan
Dan

Reputation: 12074

Here, I use substring to pull out the year and then I check it against your condition (i.e., < 2000) and pull out the elements of x that are TRUE.

x<-c("abcd20090809.txt", "bc20100209.txt", "bcd19971109.txt",
     "abcef20120802.txt", "efg20151109.txt","xyz19860102.txt")

x[as.numeric(substring(x,nchar(x)-11,nchar(x)-8))<2000]
#> [1] "bcd19971109.txt" "xyz19860102.txt"

Created on 2019-02-08 by the reprex package (v0.2.1)

Upvotes: 1

Related Questions