Lorraine McChesney
Lorraine McChesney

Reputation: 1

Calculating precipitation intensity in R

I have hourly precipitation data for multiple days. Is there anyway for R to identify when precipitation is greater than zero, add it together and divide by how long it was raining to get the intensity, or average rainfall, of the storm? I am new to R and I know how to get the mean precipitation for each day, but I'd rather have the mean rainfall for each rain event. Thanks

Upvotes: 0

Views: 1797

Answers (2)

aaryno
aaryno

Reputation: 636

I downloaded the CSV from the link at the bottom of the url you posted and got something that looks like this, which I'll use for my example. Note that the DateUTC field in the last column has some garbage I had to get rid of.

> str(dat)
'data.frame':   45 obs. of  15 variables:
 $ TimeEDT             : chr  "12:54 AM" "1:54 AM" "2:54 AM" "3:54 AM" ...
 $ TemperatureF        : num  62.1 62.1 60.8 61 62.1 62.1 62.1 64.9 66.9 69.1 ...
 $ Dew.PointF          : num  55.9 55 55.4 55.9 55.9 55.9 57 55.9 57 57 ...
 $ Humidity            : int  80 78 82 83 80 80 84 73 70 65 ...
 $ Sea.Level.PressureIn: num  29.9 29.9 29.9 29.9 29.9 ...
 $ VisibilityMPH       : num  10 10 10 10 10 10 10 10 10 10 ...
 $ Wind.Direction      : chr  "Calm" "SE" "Calm" "Calm" ...
 $ Wind.SpeedMPH       : chr  "Calm" "3.5" "Calm" "Calm" ...
 $ Gust.SpeedMPH       : chr  "-" "-" "-" "-" ...
 $ PrecipitationIn     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Events              : chr  "" "" "" "" ...
 $ Conditions          : chr  "Clear" "Partly Cloudy" "Clear" "Overcast" ...
 $ WindDirDegrees      : int  0 140 0 0 0 0 0 180 200 170 ...
 $ DateUTC.br...       : chr  "2015-06-12 04:54:00<br />" "2015-06-12 05:54:00<br />" "2015-06-12 06:54:00<br />" "2015-06-12 07:54:00<br />" ...

To get the intensity of each precipitation event from this data.frame:

dat <- read.csv(url('http://www.wunderground.com/history/airport/KBTV/2015/6/12/DailyHistory.html?req_city=Burlington&req_state=VT&req_statename=&reqdb.zip=05401&reqdb.magic=1&reqdb.wmo=99999&format=1'),stringsAsFactors=FALSE)

# What do you want to do with NA? Assume no rain for now.
dat$PrecipitationIn <- as.numeric(dat$PrecipitationIn)
dat$PrecipitationIn[is.na(dat$Precipitation)]=0

# Just look for changes in the sequence where precip starts or stops
# and adjust for boundary effects 
rainingAtStart<-dat$PrecipitationIn[1]>0
dif<-c(rainingAtStart,diff(dat$PrecipitationIn>0))        

startEvent <- which(dif>0)
endEvent <- which(dif<0)
if (dat$PrecipitationIn[length(dat[,1])]>0){
  endEvent=c(endEvent,length(dat[,1]))
}
X <- data.frame(cbind(startEvent,endEvent,
                      dat$DateUTC.br...[startEvent],
                      dat$DateUTC.br...[endEvent]))
names(X) <- c("indStart","indEnd","eventStart","eventEnd")

# Calculate the sum for each precip event
precipByEvent <- apply(X,1,function(x){ sum(dat$PrecipitationIn[x[1]:x[2]]) })
X$eventTotal <- precipByEvent
str(X)

 'data.frame':  3 obs. of  5 variables:
  $ indStart  : Factor w/ 3 levels "15","19","28": 1 2 3
  $ indEnd    : Factor w/ 3 levels "15","26","45": 1 2 3
  $ eventStart: Factor w/ 3 levels "2015-06-12 18:54:00<br />",..: 1 2 3
  $ evendEnd  : Factor w/ 3 levels "2015-06-12 18:54:00<br />",..: 1 2 3
  $ eventTotal: num  0.01 1.12 4.65

I get some weird HTML code in the eventStart and eventEnd from fetching the data directly from the CSV link in the url you gave, plus it's a factor, so let's fix that and turn it into a time object. Base R provides time-based functionality with POSIXct class, so no additional libraries are needed.

X$eventStart <- gsub('<br />','',X$eventStart)
X$eventEnd <- gsub('<br />','',X$eventEnd)

Ideally, it would be a time object (POSIXct) rather than a chr object, which will allow you to do math on it:

X$eventStart <- as.POSIXct(X$eventStart,format="%Y-%m-%d %H:%M:%S")
X$eventEnd <- as.POSIXct(X$eventEnd,format="%Y-%m-%d %H:%M:%S")

Now you can get the intensity by taking the sum divided by event time (rounding up a little, since we assume precip starts at the beginning and ends at the end of any monitoring. How you account for that is up to you).

X$inchesPerHour <- X$eventTotal / (as.double(difftime(X$eventEnd,X$eventStart,units="hours")))

str(X)
'data.frame':   3 obs. of  7 variables:
 $ indStart     : Factor w/ 3 levels "15","19","28": 1 2 3
 $ indEnd       : Factor w/ 3 levels "16","27","45": 1 2 3
 $ eventStart   : POSIXct, format: "2015-06-12 18:54:00" "2015-06-12 22:49:00" "2015-06-13 01:31:00"
 $ eventEnd     : POSIXct, format: "2015-06-12 20:54:00" "2015-06-13 00:50:00" "2015-06-13 03:54:00"
 $ eventTotal   : num  0.01 1.12 4.65
 $ inchesPerHour: num  0.005 0.555 1.951

Now your X data.frame has the event start and end times, the position (row) in the original data source that the start/end are derived from, the event precip total (inches) and the intensity (inches per hour).

Note on intensity and event duration:

There is some overestimation on the event duration since we assume that the rain starts at the beginning of a sampling period in which precipitation is reported and ends at the beginning of the next period in which it is no longer precipitating. Thus, a 5 minute event that starts and stops between samples (measurements, or rows) will be recorded as a one-hour event duration. More interestingly, a 5-minute event that overlaps a measurement (say it rains 2 minutes before a measurement and 3 minutes after) will be treated as a two-hour event.

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145775

The rle (run length encoding) function is very useful for this type of question. Using @aaryno's lovely data:

dat <- read.csv(url('http://www.wunderground.com/history/airport/KBTV/2015/6/12/DailyHistory.html?req_city=Burlington&req_state=VT&req_statename=&reqdb.zip=05401&reqdb.magic=1&reqdb.wmo=99999&format=1'),stringsAsFactors=FALSE)

# What do you want to do with NA? Assume no rain for now.
dat$PrecipitationIn = as.numeric(dat$PrecipitationIn)
dat$PrecipitationIn[is.na(dat$Precipitation)] = 0

precip = dat$PrecipitationIn
consec_precip = rle(precip > 0)
# calculates runs of consecutive hours of rain

# create an ID for each run of consecutive hours of rain
storm_id = rep(0, length(precip))
storm_id[precip > 0] = rep(1:sum(consec_precip$values),
                           times = consec_precip$lengths[consec_precip$values])

# calculate mean precipitation within each consecutive rain period
tapply(precip, storm_id, mean)
# 0 corresponds to all the times with no rain

The rle approach depends on the data being evenly spaced, you'd need a more complicated approach if the times were irregular.

Upvotes: 1

Related Questions