Reputation: 331
I would love some help calculating the time since the temperature was as cold as it was on a particular date.
So in the example data frame below, for the first record (01/07/2000) the previous time it was as cold as this (-1) was 01/01/2000 (around 182 days before).
for the second record, (01/06/2000) the previous time it was that cold (2 degrees) was the previous month (01/05/2000) where it was actually colder (1 degree) (so around 30 days before).
df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000",
"01/04/2000", "01/03/2000", "01/02/2000",
"01/01/2000"), "%d/%m/%Y"),
temperature =c(-1, 2, 1, 0, 1, 1, -1))
I have tried modifying this approach (Calculate days since last event in R) but found it became unwieldy when calculating for each week.
Any ideas how you might calculate the number of days since the weather was that cold, for each week? Many thanks, indeed for your help.
Upvotes: 1
Views: 130
Reputation: 72673
Supposed you have temperature data of different grids like this,
# date grid temp
# 1 2000-01-01 A -1
# 2 2000-02-01 A -1
# 3 2000-03-01 A -1
# ...
# 10 2000-01-01 B 2
# 11 2000-02-01 B 1
# ...
You could do a split-apply-combine approach along the grids using by
. In each grid unit, we apply a Vectorize
d function, that calculates the diff
erence in days since the previous occurrence of the temperature of a specific date. If there is no event before it gives NA
.
f <- Vectorize(function(data, x) {
diff(rev(with(data, date[date <= x & temp == temp[date == x]]))[2:1])
}, vectorize.args="x")
res <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f(g, g$date))))
res
# date grid temp last
# A.1 2000-01-01 A -1 NA
# A.2 2000-02-01 A -1 31
# A.3 2000-03-01 A -1 29
# A.4 2000-04-01 A -1 31
# A.5 2000-05-01 A 0 NA
# A.6 2000-06-01 A 2 NA
# A.7 2000-07-01 A 0 61
# A.8 2000-08-01 A 0 31
# A.9 2000-09-01 A -1 153
# B.10 2000-01-01 B 2 NA
# B.11 2000-02-01 B 1 NA
# B.12 2000-03-01 B 2 60
# B.13 2000-04-01 B 1 60
# B.14 2000-05-01 B 2 61
# B.15 2000-06-01 B -1 NA
# B.16 2000-07-01 B -1 30
# B.17 2000-08-01 B 0 NA
# B.18 2000-09-01 B 2 123
# C.19 2000-01-01 C 0 NA
# C.20 2000-02-01 C 0 31
# C.21 2000-03-01 C 1 NA
# C.22 2000-04-01 C 1 31
# C.23 2000-05-01 C -1 NA
# C.24 2000-06-01 C -1 31
# C.25 2000-07-01 C 1 91
# C.26 2000-08-01 C 2 NA
# C.27 2000-09-01 C -1 92
To find out when the temperature was below a specific temperature threshold temp.th
we could modify the function like so:
temp.th <- 0
f2 <- Vectorize(function(data, x) {
x - rev(with(data, date[date <= x & temp < temp.th]))[1]
}, vectorize.args="x")
res2 <- do.call(rbind, by(d, d$grid, function(g) cbind(g, last=f2(g, g$date))))
res2
# date grid temp last
# A.1 2000-01-01 A -1 0
# A.2 2000-02-01 A -1 0
# A.3 2000-03-01 A -1 0
# A.4 2000-04-01 A -1 0
# A.5 2000-05-01 A 0 30
# A.6 2000-06-01 A 2 61
# A.7 2000-07-01 A 0 91
# A.8 2000-08-01 A 0 122
# A.9 2000-09-01 A -1 0
# B.10 2000-01-01 B 2 NA
# B.11 2000-02-01 B 1 NA
# B.12 2000-03-01 B 2 NA
# B.13 2000-04-01 B 1 NA
# B.14 2000-05-01 B 2 NA
# B.15 2000-06-01 B -1 0
# B.16 2000-07-01 B -1 0
# B.17 2000-08-01 B 0 31
# B.18 2000-09-01 B 2 62
# C.19 2000-01-01 C 0 NA
# C.20 2000-02-01 C 0 NA
# C.21 2000-03-01 C 1 NA
# C.22 2000-04-01 C 1 NA
# C.23 2000-05-01 C -1 0
# C.24 2000-06-01 C -1 0
# C.25 2000-07-01 C 1 30
# C.26 2000-08-01 C 2 61
# C.27 2000-09-01 C -1 0
Data:
d <- expand.grid(date=seq(as.Date("2000-01-01"), as.Date("2000-09-01"), by="month"),
grid=LETTERS[1:3])
set.seed(42)
d$temp <- sample(-1:2, nrow(d), replace=T)
Upvotes: 1
Reputation: 331
Here is the working code, based on Jay's answer above
require(data.table)
df <- data.frame(date=as.Date(c("01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000", "01/07/2000", "01/06/2000", "01/05/2000", "01/04/2000", "01/03/2000", "01/02/2000", "01/01/2000"), "%d/%m/%Y"),
temperature =c(-1, 2, 1, 0, 1, 1, -1, -2, 3, 2, 0, -1, 2, -1 ),
met_square = c(1,1,1,1,1,1,1, 2,2,2,2,2,2,2))
setDT(df)
df3 <- df[order(date),] # making sure the dates are in the right order
f <- Vectorize(function(data, x) {
diff(rev(with(data, date[date <= x & temperature <= temperature[date == x]]))[2:1])
}, vectorize.args="x")
res <- do.call(rbind, by(df3, df3$met_square, function(g) cbind(g, last=f(g, g$date))))
res
Upvotes: 0
Reputation: 388862
Base R option using sapply
:
c(sapply(seq(nrow(df) - 1), function(x) {
tmp <- -(1:x)
inds <- which(df$temperature[x] >= df$temperature[tmp])[1]
df$date[x] - df$date[tmp][inds]
}), NA)
#[1] 182 31 30 91 29 31 NA
This assumes your data is sorted in decreasing order meaning the latest date is first (same as your example data).
To apply this by group we can turn the above code to function :
diff_days <- function(temp, date) {
c(sapply(seq_len(length(temp) - 1), function(x) {
tmp <- -(1:x)
inds <- which(temp[x] >= temp[tmp])[1]
date[x] - date[tmp][inds]
}), NA)
}
library(dplyr)
df %>%
group_by(met_square) %>%
mutate(result = diff_days(temperature, date)) %>%
ungroup
# date temperature met_square result
# <date> <dbl> <dbl> <dbl>
# 1 2000-07-01 -1 1 182
# 2 2000-06-01 2 1 31
# 3 2000-05-01 1 1 30
# 4 2000-04-01 0 1 91
# 5 2000-03-01 1 1 29
# 6 2000-02-01 1 1 31
# 7 2000-01-01 -1 1 NA
# 8 2000-07-01 -2 2 NA
# 9 2000-06-01 3 2 31
#10 2000-05-01 2 2 30
#11 2000-04-01 0 2 31
#12 2000-03-01 -1 2 60
#13 2000-02-01 2 2 31
#14 2000-01-01 -1 2 NA
Upvotes: 1