Sean
Sean

Reputation: 133

Remove data not fitting a threshold based on other variables in R

I am attempting to remove specific data (NEE) that was collected under poor conditions (Ustar < ths). The threshold is dependent upon the season. Currently, I am using a for loop, and I know R processes these very slowly, so I am looking for a better method. The dataframe is multi-variable and named Peaches.

DoY is Day of year, Ustar is a variable describing the conditions, and NEE is the observation which is thrown out under low Ustar values. The seasonal threshold is labeled as ths_1, 2, 3, or 4 depending upon the time of year. Bad data is labeled as -999 (due to requirements for other programs), but could be set as NA and then changed later if it makes the code more efficient.

This is for two years (Peaches13 and Peaches14), and years do not need to be the same length. Thresholds are the same for multiple years, and only season specific.

This is my current setup:

for (i in 1:length(Peaches13$DoY)){
  if((Peaches13$DoY[i] < 90)&&(Peaches13$Ustar[i] < ths_1)){
    Peaches13$NEE[i] <- -999
    }
  if((Peaches13$DoY[i] < 180)&&(Peaches13$DoY[i] >= 90)&&(Peaches13$Ustar[i] < ths_2)){
    Peaches13$NEE[i] <- -999
  }
  if((Peaches13$DoY[i] < 270)&&(Peaches13$DoY[i] >= 180)&&(Peaches13$Ustar[i] < ths_3)){
    Peaches13$NEE[i] <- -999
  }
  if((Peaches13$DoY[i] >= 270)&&(Peaches13$Ustar[i] < ths_4)){
    Peaches13$NEE[i] <- -999
  }
}

for (i in 1:length(Peaches14$DoY)){
  if((Peaches14$DoY[i] < 90)&&(Peaches14$Ustar[i] < ths_1)){
    Peaches14$NEE[i] <- -999
  }
  if((Peaches14$DoY[i] < 180)&&(Peaches14$DoY[i] >= 90)&&(Peaches14$Ustar[i] < ths_2)){
    Peaches14$NEE[i] <- -999
  }
  if((Peaches14$DoY[i] < 270)&&(Peaches14$DoY[i] >= 180)&&(Peaches14$Ustar[i] < ths_3)){
    Peaches14$NEE[i] <- -999
  }
  if((Peaches14$DoY[i] >= 270)&&(Peaches14$Ustar[i] < ths_4)){
    Peaches14$NEE[i] <- -999
  }
}

Upvotes: 1

Views: 124

Answers (1)

Jthorpe
Jthorpe

Reputation: 10204

You don't need a for loop. For example, you're first for loop could be replace with :

badValues <- with(Peaches13,
                (((DoY < 90)&(Ustar < ths_1)) | 
                ((DoY < 180)&(DoY >= 90)&(Ustar < ths_2)) | 
                ((DoY < 270)&(DoY >= 180)&(Ustar < ths_3)) | 
                ((DoY >= 270)&(Ustar < ths_4)) ) )

Peaches13$NEE[badValues] <- -999

which would be much faster. You could also go the dplyr route as in:

library(dplyr)
df <- mutate(Peaches13, NEE = ifelse(badValues , -999, NEE))

Upvotes: 2

Related Questions