Reputation: 31
I'm pretty new to coding in general, but am using R for a dissertation project. I have a netcdf file containing daily temperature data covering the whole of Kazakhstan with a resolution of 0.75 degrees of longitude and latitude, from 1979 to 2016. The 3 dimensions are time (size 13696), latitude (size 21), and longitude (size 61). Time has been measured in seconds since 1900.
Is there a way to get make a new array of monthly averages?
The only way I could figure out is a VERY inefficient way of doing this and now it has stopped working. My code is as follows:
mon.av <- array(dim = c(61,21,444))
for(years in 1:37) {
if(years == 1) {
for(x in 1:61){
for(y in 1:21){
mon.av[x, y, 1+(12*(years-1))] <- mean(m2tmp[x, y, 1+(years - 1)* 365:31+(years-1)*365])
mon.av[x, y, 2+(12*(years-1))] <- mean(m2tmp[x, y, 32+(years - 1)* 365:59+(years-1)*365])
mon.av[x, y, 3+(12*(years-1))] <- mean(m2tmp[x, y, 60+(years - 1)* 365:90+(years-1)*365])
mon.av[x, y, 4+(12*(years-1))] <- mean(m2tmp[x, y, 91+(years - 1)* 365:120+(years-1)*365])
mon.av[x, y, 5+(12*(years-1))] <- mean(m2tmp[x, y, 121+(years - 1)* 365:151+(years-1)*365])
mon.av[x, y, 6+(12*(years-1))] <- mean(m2tmp[x, y, 152+(years - 1)* 365:181+(years-1)*365])
mon.av[x, y, 7+(12*(years-1))] <- mean(m2tmp[x, y, 182+(years - 1)* 365:212+(years-1)*365])
mon.av[x, y, 8+(12*(years-1))] <- mean(m2tmp[x, y, 213+(years - 1)* 365:243+(years-1)*365])
mon.av[x, y, 9+(12*(years-1))] <- mean(m2tmp[x, y, 244+(years - 1)* 365:273+(years-1)*365])
mon.av[x, y, 10+(12*(years-1))] <- mean(m2tmp[x, y, 274+(years - 1)* 365:304+(years-1)*365])
mon.av[x, y, 11+(12*(years-1))] <- mean(m2tmp[x, y, 305+(years - 1)* 365:334+(years-1)*365])
mon.av[x, y, 12+(12*(years-1))] <- mean(m2tmp[x, y, 335+(years - 1)* 365:365+(years-1)*365])
}
}
}
}
Where I have to copy this out and change if(years == 1)
to the year number, and also have to change for leap years!
This seemed to be working ok until year 20 when the error message appeared:
Error in m2tmp[x, y, 1 + (years - 1) * 365:31 + (years - 1) * 365] : subscript out of bounds
So I would just like to know if there is an easier way to get the monthly average on this data, or if not, what is the mistake in my code as it is now?
Would really appreciate any help!
Upvotes: 2
Views: 2123
Reputation: 8107
I think this is easier to do in bash using CDO before you invoke R:
$ cdo monmean input.nc output.nc
you can find the documentation here: https://code.zmaw.de/projects/cdo/wiki/Cdo#Documentation
If you don't have it installed, on Ubuntu:
sudo apt-get install cdo
Upvotes: 2
Reputation: 315
For efficiency in R I would not do loops in general but create a dataframe that contains the columns LAT, LON, TIME and your TEMP value. From what I understand there are two steps:
1) transform date from sec since 1900 to year/month
2) calculate the monthly average for each location (i.e. lat/lon)
I am creating a dummy dataset here, with some random coordinates within Kazakhstan, random times in sec from 1900 and some random temperatures. Random time, lat, lon:
time = c(1, 13696, 1)
lat = rnorm(21, mean=46.77, sd=0.1)
lon = rnorm(61, mean=49.77, sd=0.1)
Create data.frame
climatedata = expand.grid(time=time, lat=lat, lon=lon)
> head(climatedata)
time lat lon
1 1 46.85790 49.6907
2 13696 46.85790 49.6907
3 1 46.85790 49.6907
4 1 46.76574 49.6907
5 13696 46.76574 49.6907
6 1 46.76574 49.6907
Create random temperature data
climatedata$temp = rnorm(dim(climatedata)[1], mean=20, sd=5)
Convert time from sec since 1900 into year-month (*I am not entirely sure if that is the correct one - better check this conversion: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ColeBeck/datestimes.pdf)
climatedata$date <- format(as.Date(as.POSIXct(climatedata$time, origin="1900-01-01")), "%Y-%m")
> head(climatedata)
time lat lon temp date
1 1 46.85790 49.6907 22.25540 1900-01
2 13696 46.85790 49.6907 15.39590 1900-01
3 1 46.85790 49.6907 19.23888 1900-01
4 1 46.76574 49.6907 16.94528 1900-01
5 13696 46.76574 49.6907 11.92085 1900-01
6 1 46.76574 49.6907 22.44737 1900-01
And then use the plyr package to calculate the mean temperature for each combination of date (year-month) and location (lat/lon) like so
avClimateData = ddply(climatedata, .(date, lon, lat),
summarise, monthlyAv = mean(temp))
Upvotes: 1