Reputation: 11
I am working with a dataset of hourly temperatures and I need to calculate "degree hours" above a heat threshold for each extreme event. I intend to run stats on the intensities (combined magnitude and duration) of each event to compare multiple sites over the same time period.
Example of data:
Temp
1 14.026
2 13.714
3 13.25
.....
21189 12.437
21190 12.558
21191 12.703
21192 12.896
Data after selecting only hours above the threshold of 18 degrees and then subtracting 18 to reveal degrees above 18:
Temp
5297 0.010
5468 0.010
5469 0.343
5470 0.081
5866 0.010
5868 0.319
5869 0.652
After this step I need help to sum consecutive hours during which the reading exceeded my specified threshold.
What I am hoping to produce out of above sample:
Temp
1 0.010
2 0.434
3 0.010
4 0.971
I've debated manipulating these data within a time series or by adding additional columns, but I do not want multiple rows for each warming event. I would immensely appreciate any advice.
Upvotes: 1
Views: 596
Reputation: 9687
This is an alternative solution in base R.
You have some data that walks around, and you want to sum up the points above a cutoff. For example:
set.seed(99999)
x <- cumsum(rnorm(30))
plot(x, type='b')
abline(h=2, lty='dashed')
which looks like this:
First, we want to split the data in to groups based on when they cross the cutoff. We can use run length encoding on the indicator to get a compressed version:
x.rle <- rle(x > 2)
which has the value:
Run Length Encoding
lengths: int [1:8] 5 2 3 1 9 4 5 1
values : logi [1:8] FALSE TRUE FALSE TRUE FALSE TRUE ...
The first group is the first 5 points where x > 2 is FALSE; the second group is the two following points, and so on.
We can create a group id by replacing the values in the rle object, and then back transforming:
x.rle$values <- seq_along(x.rle$values)
group <- inverse.rle(x.rle)
Finally, we aggregate by group, keeping only the data above the cut off:
aggregate(x~group, subset = x > 2, FUN=sum)
Which produces:
group x
1 2 5.113291213
2 4 2.124118005
3 6 11.775435706
4 8 2.175868979
Upvotes: 2
Reputation: 5590
I'd use data.table
for this, although there are certainly other ways.
library( data.table )
setDT( df )
temp.threshold <- 18
First make a column showing the previous value from each one in your data. This will help to find the point at which the temperature rose above your threshold value.
df[ , lag := shift( Temp, fill = 0, type = "lag" ) ]
Now use that previous value column to compare with the Temp
column. Mark every point at which the temperature rose above the threshold with a 1, and all other points as 0.
df[ , group := 0L
][ Temp > temp.threshold & lag <= temp.threshold, group := 1L ]
Now we can get cumsum
of that new column, which will give each sequence after the temperature rose above the threshold its own group
ID.
df[ , group := cumsum( group ) ]
Now we can get rid of every value not above the threshold.
df <- df[ Temp > temp.threshold, ]
And summarise what's left by finding the "degree hours" of each "group".
bygroup <- df[ , sum( Temp - temp.threshold ), by = group ]
I modified your input data a little to provide a couple of test events where the data rose above threshold:
structure(list(num = c(1L, 2L, 3L, 4L, 5L, 21189L, 21190L, 21191L,
21192L, 21193L, 21194L), Temp = c(14.026, 13.714, 13.25, 20,
19, 12.437, 12.558, 12.703, 12.896, 21, 21)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("num",
"Temp"), spec = structure(list(cols = structure(list(num = structure(list(), class = c("collector_integer",
"collector")), Temp = structure(list(), class = c("collector_double",
"collector"))), .Names = c("num", "Temp")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
With that data, here's the output of the code above (note $V1
is in "degree hours"):
> bygroup
group V1
1: 1 3
2: 2 6
Upvotes: 1