Emma Tebbs
Emma Tebbs

Reputation: 1467

average gridded climate data for duplicated times in r

I have a gridded climate dataset, such as:

# generate time vector
time1 <- seq(14847.5,14974.5, by = 1)
time2 <- seq(14947.5,14974.5, by = 1)
time <- c(time1,time2)
time <- as.POSIXct(time*86400,origin='1970-01-01 00:00')

# generate lat and lon coordinates
lat <- seq(80,90, by = 1)
lon <- seq(20,30, by = 1)

# generate 3dimensional array
dat <- array(runif(length(lat)*length(lon)*length(time)),
             dim = c(length(lon),length(lat),length(time)))

such that

> dim(dat)
[1]  11  11 156

the dimensions of the data are describing the variable at different longitude (dim = 1), latitude (dim = 2), and time (dim = 3).

The issue I have at the moment is that some of the times are repeated, something to do with overlapping sensors measuring the data. Therefore, I was wondering if it was possible to only keep the unique times for dat, but average the data within the grid for the duplicated times i.e. if there are two repeated days we take the average value in each latitude and longitude grid for that time.

I can find the unique times as:

# only select unique times
new_time <- unique(time)
unique_time <- unique(time)

The following code then aims to loop through each grid (lat/lon) and average all of the duplicated days.

# loop through lat/lon coordinates to generate new data
new_dat <- array(dim = c(length(lon),length(lat),length(new_time)))
for(i in 1:length(lon)){
  for(ii in 1:length(lat)){
    dat2 <- dat[i,ii,]
    dat2b <- NA
    for(k in 1:length(unique_time)){
      idx <- time == unique_time[k]
      dat2b[k] <- mean(dat2[idx], na.rm = TRUE)
    }
    new_dat[i,ii,] <- dat2b
  }
}

I'm convinced that this provides the correct answer, but I'm certain there is a much cleaner method do achieve this.

I should also note that my data is quite large (i.e. k = 7000), so this last loop is not very efficient, to say the least.

Upvotes: 2

Views: 143

Answers (1)

Hack-R
Hack-R

Reputation: 23214

My original answer:

This is a bit more concise and efficient by use of aggregate:

for(i in 1:length(lon)){
  for(ii in 1:length(lat)){
    new_dat[i,ii,] <- as.numeric(aggregate(dat[i,ii,], by=list(time),mean)$x)
  }
}

It still has 2 out of the 3 of the loops, but it manages to bypass creating dat2, dat2b, and unique_time.

My improved answer:

f <- function(i, ii){as.numeric(aggregate(dat[i,ii,], by=list(time),mean)$x)}

for(i in 1:nrow(expand.grid(1:length(lon),1:length(lat)))){
  new_dat[expand.grid(1:length(lon),1:length(lat))[i,1],
          expand.grid(1:length(lon),1:length(lat))[i,2],] <- 
    f(expand.grid(1:length(lon),1:length(lat))[i,1],expand.grid(1:length(lon),
           1:length(lat))[i,2])
}

Got it down to just 1 loop. We could probably bypass that loop too with an apply.

Upvotes: 3

Related Questions