bricevk
bricevk

Reputation: 207

creating date variable from file names in R

I need some help creating a dataset in R where each observation contains a latitude, longitude, and date. Right now, I have a list of roughly 2,000 files gridded by lat/long, and each file contains observations for one date. Ultimately, what I need to do, is combine all of these files into one file where each observation contains a date variable that is pulled from the name of its file.

So for instance, a file is named "MERRA2_400.tavg1_2d_flx_Nx.20120217.SUB.nc". I want all observations from that file to contain a date variable for 02/17/2012.

That "nc" extension describes a netCDF file, which can be read into R as follows:

library(RNetCDF)
setwd("~/Desktop/Thesis Data")
p1a<-"MERRA2_300.tavg1_2d_flx_Nx.20050101.SUB.nc"
pid<-open.nc(p1a)
dat<-read.nc(pid)

I know the ldply command can by useful for extracting and designating a new variable from the file name. But I need to create a loop that combines all the files in the 'Thesis Data' folder above (set as my wd), and gives them date variables in the process.

I have been attempting this using two separate loops. The first loop uploads files one by one, creates a date variable from the file name, and then resaves them into a new folder. The second loop concatenates all files in that new folder. I have had little luck with this strategy.

view[dat]

As you can hopefully see in this picture, which describes the data file uploaded above, each file contains a time variable, but that time variable has one observation, which is 690, in each file. So I could replace that variable with the date within the file name, or I could create a new variable - either works.

Any help would be much appreciated!

Upvotes: 0

Views: 774

Answers (1)

denisafonin
denisafonin

Reputation: 1136

I do not have any experience working with .nc files, but what I think you need to do, in broad strokes, is this:

filenames <- list.files(path = ".") # Creates a character vector of all file names in working directory

Creating empty dataframe with column names:

final_data <- data.frame(matrix(ncol = ..., nrow = 0)) # enter number of columns you will have in the final dataset
colnames(final_data) <- c("...", "...", "...", ...) # create column names

For each filename, read in file, create date column and write as object in global environment:

for (i in filenames) {
  pid<-open.nc(i)
  dat<-read.nc(pid) 

  date <- ... # use regex to get your date from i and convert it into date

  dat$date <- date

  final_data <- rbind(final_data, dat)
}

Upvotes: 0

Related Questions