Reputation: 1467
I'm reading some netcdf files from a directory into R. The netcdf files are names according to some specific feature of the data.
Here is an example:
aa <- c("dayavg_fcst_surf125.011_tmp.1962010100_1962123121.nc",
"dayavg_fcst_surf125.011_tmp.1972010100_1972123121.nc",
"dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc",
"dayavg_fcst_surf125.011_tmp.1992010100_1992123121.nc",
"dayavg_fcst_surf125.011_tmp.2002010100_2002123121.nc",
"dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc",
"dayavg_fcst_surf125.011_tmp.2012010100_2012123121.nc",
"dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc",
"dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc",
"dayavg_fcst_surf125.011_tmp.2015020100_2015022821.nc")
These were collected using the list.files function.
I would like to select (keep) a subset of these filenames (as strings), specifically the files that refer to the data collected in 2010 and 2014.
The year is indicated in the filenames following the '.tmp' string. For example, the first entry would be the year 1962, and so on.
To achieve this, I have tried the following:
iyears <- c(2010,2014)
ll <- list()
for (i in 1:length(iyears)){
ll[[i]] <- aa[grepl(iyears[i],aa)]
}
ll <- c(ll[[1]],ll[[2]])
which returns:
> ll
[1] "dayavg_fcst_surf125.011_tmp.1962010100_1962123121.nc" "dayavg_fcst_surf125.011_tmp.1972010100_1972123121.nc"
[3] "dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc" "dayavg_fcst_surf125.011_tmp.1992010100_1992123121.nc"
[5] "dayavg_fcst_surf125.011_tmp.2002010100_2002123121.nc" "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
[7] "dayavg_fcst_surf125.011_tmp.2012010100_2012123121.nc" "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
[9] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc" "dayavg_fcst_surf125.011_tmp.2015020100_2015022821.nc"
[11] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc" "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"
whereas the answer should be:
> ll
[1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc" "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
[3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"
The problem is that the date string in the file name is as follows:
yyyymmddhh
so, 2010 also appears in
"dayavg_fcst_surf125.011_tmp.1982010100_1982123121.nc",
due to 198[2 01 0]1.
Can anyone suggest a method of obtaining the desired result?
Upvotes: 0
Views: 1740
Reputation: 3376
Why do not you use the pattern
argument in the list.files
:
list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
pattern: an optional regular expression. Only file names which match the regular expression will be returned.
Ref: R help
Upvotes: 0
Reputation:
The main trick is to specify where the actual year is in your strings. The following should work:
iyears <- c(2010,2014)
ll <- list()
for (i in 1:length(iyears)){
ll[[i]] <- aa[grepl(paste0("^dayavg_fcst_surf125\\.011_tmp\\.",iyears[i]),aa)]
}
ll <- c(ll[[1]],ll[[2]])
# [1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
# [2] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
# [3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"
Upvotes: 2
Reputation: 193667
Since the tmp.
portion seems to be a regular feature in the file names, a very direct way to resolve this would be to use that as part of your search string:
> grep("tmp.2010|tmp.2014", aa, value = TRUE)
[1] "dayavg_fcst_surf125.011_tmp.2010010100_2010123121.nc"
[2] "dayavg_fcst_surf125.011_tmp.2014020100_2014022821.nc"
[3] "dayavg_fcst_surf125.011_tmp.2014120100_2014123121.nc"
Upvotes: 4