Reputation: 1
I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.
for (b in 1:299) { #longitude
for (c in 1:424) { #latitude
data <- year[b,c] #makes all temps into a vector
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
numberdf[b,c]=numberdf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else {
frequencydf[b,c]=frequencydf[b,c]
numberdf[b,c]=numberdf[b,c]
tempsdf[b,c]=tempsdf[b,c]
}
}
durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
}
})
Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.
I am starting with the frequency parameter and trying to create the same data frame as above.
frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
frequencydf <- as.data.frame(frequencylist)
Using this function:
frequency.calc1 = function(cell) {
data <- as.vector(cell)
frequency <- 0
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequency=frequency+1
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequency=frequency+1
}else {
frequency=frequency
}
return(frequency)
}
}
I am very new to creating functions and using the Apply function so any advice would be appreciated!
Upvotes: 0
Views: 77
Reputation: 1
The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.
for (b in 1:299) {
for (c in 1:424) {
data <- year[b,c]
N=length(data)
frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
duration[b,c] <- frequency[b,c]/number[b,c]
temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
temps[b,c] <- temps[b,c]/frequency[b,c]
}}
Thank you for your help @Carl Witthoft
Upvotes: 0
Reputation: 21532
For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency
. Figure out how to combine them. Next, remember that the R
language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like
frequency <- sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)
I haven't checked all the ">" vs "<" but you get the idea.
As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1
Upvotes: 1