sfellows
sfellows

Reputation: 1

multiApply function for loop on a 3D array

I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.

enter image description here

for (b in 1:299) { #longitude
      for (c in 1:424) { #latitude
        data <- year[b,c] #makes all temps into a vector
        for (d in 2:364) {
          if (data[d]>=Threshold & data[d+1]>=Threshold) {
            frequencydf[b,c]=frequencydf[b,c]+1
            tempsdf[b,c]=tempsdf[b,c]+data[d]
            
            }else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
              frequencydf[b,c]=frequencydf[b,c]+1
              numberdf[b,c]=numberdf[b,c]+1
              tempsdf[b,c]=tempsdf[b,c]+data[d]
            }else {
              frequencydf[b,c]=frequencydf[b,c]
              numberdf[b,c]=numberdf[b,c]
              tempsdf[b,c]=tempsdf[b,c]
            }
        }
        durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
        tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
      }
    })

Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.

I am starting with the frequency parameter and trying to create the same data frame as above.

frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
  frequencydf <- as.data.frame(frequencylist)

Using this function:

frequency.calc1 = function(cell) {
  data <- as.vector(cell)
  frequency <- 0
  for (d in 2:364) {
    if (data[d]>=Threshold & data[d+1]>=Threshold) {
      frequency=frequency+1
      
    }else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
      frequency=frequency+1
      
    }else {
      frequency=frequency
    }
    return(frequency)
  }
}

I am very new to creating functions and using the Apply function so any advice would be appreciated!

Upvotes: 0

Views: 77

Answers (2)

sfellows
sfellows

Reputation: 1

The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.

for (b in 1:299) {
 for (c in 1:424) {
  data <- year[b,c]
    N=length(data)
    frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
    number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
    duration[b,c] <- frequency[b,c]/number[b,c]
    temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
    temps[b,c] <- temps[b,c]/frequency[b,c]
}}

Thank you for your help @Carl Witthoft

Upvotes: 0

Carl Witthoft
Carl Witthoft

Reputation: 21532

For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency. Figure out how to combine them. Next, remember that the R language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like

frequency <-  sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)

I haven't checked all the ">" vs "<" but you get the idea.

As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1

Upvotes: 1

Related Questions