threeisles
threeisles

Reputation: 331

How to calculate time since temperature was so cold

I would love some help calculating the time since the temperature was as cold as it was on a particular date.

So in the example data frame below, for the first record (01/07/2000) the previous time it was as cold as this (-1) was 01/01/2000 (around 182 days before).

for the second record, (01/06/2000) the previous time it was that cold (2 degrees) was the previous month (01/05/2000) where it was actually colder (1 degree) (so around 30 days before).

df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", 
                                "01/04/2000", "01/03/2000", "01/02/2000", 
                                "01/01/2000"), "%d/%m/%Y"), 
                 temperature =c(-1, 2, 1, 0, 1, 1, -1))

I have tried modifying this approach (Calculate days since last event in R) but found it became unwieldy when calculating for each week.

Any ideas how you might calculate the number of days since the weather was that cold, for each week? Many thanks, indeed for your help.

Upvotes: 1

Views: 130

Answers (3)

jay.sf
jay.sf

Reputation: 72673

Supposed you have temperature data of different grids like this,

#          date grid temp
# 1  2000-01-01    A   -1
# 2  2000-02-01    A   -1
# 3  2000-03-01    A   -1
# ...
# 10 2000-01-01    B    2
# 11 2000-02-01    B    1
# ...

You could do a split-apply-combine approach along the grids using by. In each grid unit, we apply a Vectorized function, that calculates the difference in days since the previous occurrence of the temperature of a specific date. If there is no event before it gives NA.

f <- Vectorize(function(data, x) {
  diff(rev(with(data, date[date <= x & temp == temp[date == x]]))[2:1])
}, vectorize.args="x")
res <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f(g, g$date))))

res
#            date grid temp last
# A.1  2000-01-01    A   -1   NA
# A.2  2000-02-01    A   -1   31
# A.3  2000-03-01    A   -1   29
# A.4  2000-04-01    A   -1   31
# A.5  2000-05-01    A    0   NA
# A.6  2000-06-01    A    2   NA
# A.7  2000-07-01    A    0   61
# A.8  2000-08-01    A    0   31
# A.9  2000-09-01    A   -1  153
# B.10 2000-01-01    B    2   NA
# B.11 2000-02-01    B    1   NA
# B.12 2000-03-01    B    2   60
# B.13 2000-04-01    B    1   60
# B.14 2000-05-01    B    2   61
# B.15 2000-06-01    B   -1   NA
# B.16 2000-07-01    B   -1   30
# B.17 2000-08-01    B    0   NA
# B.18 2000-09-01    B    2  123
# C.19 2000-01-01    C    0   NA
# C.20 2000-02-01    C    0   31
# C.21 2000-03-01    C    1   NA
# C.22 2000-04-01    C    1   31
# C.23 2000-05-01    C   -1   NA
# C.24 2000-06-01    C   -1   31
# C.25 2000-07-01    C    1   91
# C.26 2000-08-01    C    2   NA
# C.27 2000-09-01    C   -1   92

Edit

To find out when the temperature was below a specific temperature threshold temp.th we could modify the function like so:

temp.th <- 0
f2 <- Vectorize(function(data, x) {
  x - rev(with(data, date[date <= x & temp < temp.th]))[1]
}, vectorize.args="x")
res2 <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f2(g, g$date))))

res2
#            date grid temp last
# A.1  2000-01-01    A   -1    0
# A.2  2000-02-01    A   -1    0
# A.3  2000-03-01    A   -1    0
# A.4  2000-04-01    A   -1    0
# A.5  2000-05-01    A    0   30
# A.6  2000-06-01    A    2   61
# A.7  2000-07-01    A    0   91
# A.8  2000-08-01    A    0  122
# A.9  2000-09-01    A   -1    0
# B.10 2000-01-01    B    2   NA
# B.11 2000-02-01    B    1   NA
# B.12 2000-03-01    B    2   NA
# B.13 2000-04-01    B    1   NA
# B.14 2000-05-01    B    2   NA
# B.15 2000-06-01    B   -1    0
# B.16 2000-07-01    B   -1    0
# B.17 2000-08-01    B    0   31
# B.18 2000-09-01    B    2   62
# C.19 2000-01-01    C    0   NA
# C.20 2000-02-01    C    0   NA
# C.21 2000-03-01    C    1   NA
# C.22 2000-04-01    C    1   NA
# C.23 2000-05-01    C   -1    0
# C.24 2000-06-01    C   -1    0
# C.25 2000-07-01    C    1   30
# C.26 2000-08-01    C    2   61
# C.27 2000-09-01    C   -1    0

Data:

d <- expand.grid(date=seq(as.Date("2000-01-01"), as.Date("2000-09-01"), by="month"),
            grid=LETTERS[1:3])
set.seed(42)
d$temp <- sample(-1:2, nrow(d), replace=T)

Upvotes: 1

threeisles
threeisles

Reputation: 331

Here is the working code, based on Jay's answer above

require(data.table)


df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000", "01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000"), "%d/%m/%Y"), 
                 temperature =c(-1, 2, 1, 0, 1, 1, -1, -2, 3, 2, 0, -1, 2, -1 ), 
                 met_square = c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2))



setDT(df)

df3 <- df[order(date),]  # making sure the dates are in the right order



f <- Vectorize(function(data, x) {
  diff(rev(with(data, date[date <= x & temperature <= temperature[date == x]]))[2:1])
}, vectorize.args="x")



res <- do.call(rbind, by(df3, df3$met_square, function(g) cbind(g, last=f(g, g$date))))

res

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388862

Base R option using sapply :

c(sapply(seq(nrow(df) - 1), function(x) {
  tmp <- -(1:x)
  inds <- which(df$temperature[x] >= df$temperature[tmp])[1]
  df$date[x] - df$date[tmp][inds]
}), NA)

#[1] 182  31  30  91  29  31  NA

This assumes your data is sorted in decreasing order meaning the latest date is first (same as your example data).


To apply this by group we can turn the above code to function :

diff_days <- function(temp, date) {
  c(sapply(seq_len(length(temp) - 1), function(x) {
    tmp <- -(1:x)
    inds <- which(temp[x] >= temp[tmp])[1]
    date[x] - date[tmp][inds]
  }), NA)  
}

library(dplyr)
df %>% 
  group_by(met_square) %>% 
  mutate(result = diff_days(temperature, date)) %>%
  ungroup

#    date       temperature met_square result
#   <date>           <dbl>      <dbl>  <dbl>
# 1 2000-07-01          -1          1    182
# 2 2000-06-01           2          1     31
# 3 2000-05-01           1          1     30
# 4 2000-04-01           0          1     91
# 5 2000-03-01           1          1     29
# 6 2000-02-01           1          1     31
# 7 2000-01-01          -1          1     NA
# 8 2000-07-01          -2          2     NA
# 9 2000-06-01           3          2     31
#10 2000-05-01           2          2     30
#11 2000-04-01           0          2     31
#12 2000-03-01          -1          2     60
#13 2000-02-01           2          2     31
#14 2000-01-01          -1          2     NA

Upvotes: 1

Related Questions