Reputation: 207
I need some help creating a dataset in R where each observation contains a latitude, longitude, and date. Right now, I have a list of roughly 2,000 files gridded by lat/long, and each file contains observations for one date. Ultimately, what I need to do, is combine all of these files into one file where each observation contains a date variable that is pulled from the name of its file.
So for instance, a file is named "MERRA2_400.tavg1_2d_flx_Nx.20120217.SUB.nc". I want all observations from that file to contain a date variable for 02/17/2012.
That "nc" extension describes a netCDF file, which can be read into R as follows:
library(RNetCDF)
setwd("~/Desktop/Thesis Data")
p1a<-"MERRA2_300.tavg1_2d_flx_Nx.20050101.SUB.nc"
pid<-open.nc(p1a)
dat<-read.nc(pid)
I know the ldply
command can by useful for extracting and designating a new variable from the file name. But I need to create a loop that combines all the files in the 'Thesis Data' folder above (set as my wd), and gives them date variables in the process.
I have been attempting this using two separate loops. The first loop uploads files one by one, creates a date variable from the file name, and then resaves them into a new folder. The second loop concatenates all files in that new folder. I have had little luck with this strategy.
As you can hopefully see in this picture, which describes the data file uploaded above, each file contains a time variable, but that time variable has one observation, which is 690, in each file. So I could replace that variable with the date within the file name, or I could create a new variable - either works.
Any help would be much appreciated!
Upvotes: 0
Views: 774
Reputation: 1136
I do not have any experience working with .nc files, but what I think you need to do, in broad strokes, is this:
filenames <- list.files(path = ".") # Creates a character vector of all file names in working directory
Creating empty dataframe with column names:
final_data <- data.frame(matrix(ncol = ..., nrow = 0)) # enter number of columns you will have in the final dataset
colnames(final_data) <- c("...", "...", "...", ...) # create column names
For each filename, read in file, create date column and write as object in global environment:
for (i in filenames) {
pid<-open.nc(i)
dat<-read.nc(pid)
date <- ... # use regex to get your date from i and convert it into date
dat$date <- date
final_data <- rbind(final_data, dat)
}
Upvotes: 0