Emma Tebbs
Emma Tebbs

Reputation: 1467

How to select files in a directory according to specific string in filename?

I'm reading some netcdf files from a directory into R. The netcdf files are names according to some specific feature of the data.

Here is an example:

aa <- c("dayavg_fcst_surf125.011_tmp.1962010100_1962123121.nc",
        "dayavg_fcst_surf125.011_tmp.1972010100_1972123121.nc",
        "dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc",
        "dayavg_fcst_surf125.011_tmp.1992010100_1992123121.nc",
        "dayavg_fcst_surf125.011_tmp.2002010100_2002123121.nc",
        "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc",
        "dayavg_fcst_surf125.011_tmp.2012010100_2012123121.nc",
        "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc",
        "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc",
        "dayavg_fcst_surf125.011_tmp.2015020100_2015022821.nc")

These were collected using the list.files function.

I would like to select (keep) a subset of these filenames (as strings), specifically the files that refer to the data collected in 2010 and 2014.

The year is indicated in the filenames following the '.tmp' string. For example, the first entry would be the year 1962, and so on.

To achieve this, I have tried the following:

iyears <- c(2010,2014)
ll <- list()
for (i in 1:length(iyears)){
  ll[[i]] <- aa[grepl(iyears[i],aa)]
}
ll <- c(ll[[1]],ll[[2]])

which returns:

> ll
 [1] "dayavg_fcst_surf125.011_tmp.1962010100_1962123121.nc" "dayavg_fcst_surf125.011_tmp.1972010100_1972123121.nc"
 [3] "dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc" "dayavg_fcst_surf125.011_tmp.1992010100_1992123121.nc"
 [5] "dayavg_fcst_surf125.011_tmp.2002010100_2002123121.nc" "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
 [7] "dayavg_fcst_surf125.011_tmp.2012010100_2012123121.nc" "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
 [9] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc" "dayavg_fcst_surf125.011_tmp.2015020100_2015022821.nc"
[11] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc" "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"

whereas the answer should be:

> ll
[1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc" "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
[3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"

The problem is that the date string in the file name is as follows:

yyyymmddhh

so, 2010 also appears in

"dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc",

due to 198[2 01 0]1.

Can anyone suggest a method of obtaining the desired result?

Upvotes: 0

Views: 1740

Answers (3)

user1436187
user1436187

Reputation: 3376

Why do not you use the pattern argument in the list.files:

list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

pattern: an optional regular expression. Only file names which match the regular expression will be returned.

Ref: R help

Upvotes: 0

user5363218
user5363218

Reputation:

The main trick is to specify where the actual year is in your strings. The following should work:

iyears <- c(2010,2014)
ll <- list()

for (i in 1:length(iyears)){
  ll[[i]] <- aa[grepl(paste0("^dayavg_fcst_surf125\\.011_tmp\\.",iyears[i]),aa)] 
}

ll <- c(ll[[1]],ll[[2]])

# [1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
# [2] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
# [3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

Since the tmp. portion seems to be a regular feature in the file names, a very direct way to resolve this would be to use that as part of your search string:

> grep("tmp.2010|tmp.2014", aa, value = TRUE)
[1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
[2] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
[3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"

Upvotes: 4

Related Questions