B Walker
B Walker

Reputation: 11

Add consecutive temp values above threshold to create "degree hours"

I am working with a dataset of hourly temperatures and I need to calculate "degree hours" above a heat threshold for each extreme event. I intend to run stats on the intensities (combined magnitude and duration) of each event to compare multiple sites over the same time period.

Example of data:

        Temp 
1     14.026
2     13.714
3     13.25
.....
21189 12.437
21190 12.558
21191 12.703
21192 12.896

Data after selecting only hours above the threshold of 18 degrees and then subtracting 18 to reveal degrees above 18:

       Temp
5297  0.010
5468  0.010
5469  0.343
5470  0.081
5866  0.010
5868  0.319
5869  0.652

After this step I need help to sum consecutive hours during which the reading exceeded my specified threshold.

What I am hoping to produce out of above sample:

       Temp
   1  0.010
   2  0.434
   3  0.010
   4  0.971

I've debated manipulating these data within a time series or by adding additional columns, but I do not want multiple rows for each warming event. I would immensely appreciate any advice.

Upvotes: 1

Views: 596

Answers (2)

Neal Fultz
Neal Fultz

Reputation: 9687

This is an alternative solution in base R.

You have some data that walks around, and you want to sum up the points above a cutoff. For example:

set.seed(99999)
x <- cumsum(rnorm(30))
plot(x, type='b')
abline(h=2, lty='dashed')

which looks like this:

enter image description here

First, we want to split the data in to groups based on when they cross the cutoff. We can use run length encoding on the indicator to get a compressed version:

x.rle <- rle(x > 2)

which has the value:

Run Length Encoding
  lengths: int [1:8] 5 2 3 1 9 4 5 1
  values : logi [1:8] FALSE TRUE FALSE TRUE FALSE TRUE ...

The first group is the first 5 points where x > 2 is FALSE; the second group is the two following points, and so on.

We can create a group id by replacing the values in the rle object, and then back transforming:

x.rle$values <- seq_along(x.rle$values)
group <- inverse.rle(x.rle)

Finally, we aggregate by group, keeping only the data above the cut off:

aggregate(x~group, subset = x > 2, FUN=sum)

Which produces:

  group            x
1     2  5.113291213
2     4  2.124118005
3     6 11.775435706
4     8  2.175868979

Upvotes: 2

rosscova
rosscova

Reputation: 5590

I'd use data.table for this, although there are certainly other ways.

library( data.table )
setDT( df )
temp.threshold <- 18

First make a column showing the previous value from each one in your data. This will help to find the point at which the temperature rose above your threshold value.

df[ , lag := shift( Temp, fill = 0, type = "lag" ) ]

Now use that previous value column to compare with the Temp column. Mark every point at which the temperature rose above the threshold with a 1, and all other points as 0.

df[ , group := 0L 
    ][ Temp > temp.threshold & lag <= temp.threshold, group := 1L ]

Now we can get cumsum of that new column, which will give each sequence after the temperature rose above the threshold its own group ID.

df[ , group := cumsum( group ) ]

Now we can get rid of every value not above the threshold.

df <- df[ Temp > temp.threshold, ]

And summarise what's left by finding the "degree hours" of each "group".

bygroup <- df[ , sum( Temp - temp.threshold ), by = group ]

I modified your input data a little to provide a couple of test events where the data rose above threshold:

structure(list(num = c(1L, 2L, 3L, 4L, 5L, 21189L, 21190L, 21191L, 
21192L, 21193L, 21194L), Temp = c(14.026, 13.714, 13.25, 20, 
19, 12.437, 12.558, 12.703, 12.896, 21, 21)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("num", 
"Temp"), spec = structure(list(cols = structure(list(num = structure(list(), class = c("collector_integer", 
"collector")), Temp = structure(list(), class = c("collector_double", 
"collector"))), .Names = c("num", "Temp")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

With that data, here's the output of the code above (note $V1 is in "degree hours"):

> bygroup
   group V1
1:     1  3
2:     2  6

Upvotes: 1

Related Questions