Reputation: 55
I have the 0.95 percentile threshold for temperature for each country. In the example below a week is 4 days. I want to count in a new vector/single-column-dataframe how many days each individual country's temperature is over that country's threshold on a weekly basis.
The country 95% percentile temperatures are:
q95 <- c(26,21,22,20,23)
DailyTempCountry <- data.frame(Date = c("W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4"),
Country = c("AL","AL", "AL", "AL","AL","AL", "AL", "AL",
"BE","BE", "BE", "BE", "BE","BE", "BE", "BE",
"CA","CA", "CA", "CA","CA","CA", "CA", "CA",
"DE","DE", "DE", "DE","DE","DE", "DE", "DE",
"UK","UK", "UK", "UK","UK","UK", "UK", "UK"),
DailyTemp = c(27,25,20,22,20,20,27,27,
24,22,23,18,17,19,20,16,
23,23,23,23,27,26,20,26,
19,18,17,19,16,15,19,18,
20,24,24,20,19,25,19,25))
DailyTempCountry
Date Country DailyTemp
1 W1D1 AL 27
2 W1D2 AL 25
3 W1D3 AL 20
4 W1D4 AL 22
5 W2D1 AL 20
6 W2D2 AL 20
7 W2D3 AL 27
8 W2D4 AL 27
9 W1D1 BE 24
10 W1D2 BE 22
11 W1D3 BE 23
12 W1D4 BE 18
13 W2D1 BE 17
14 W2D2 BE 19
15 W2D3 BE 20
16 W2D4 BE 16
17 W1D1 CA 23
18 W1D2 CA 23
19 W1D3 CA 23
20 W1D4 CA 23
21 W2D1 CA 27
22 W2D2 CA 26
23 W2D3 CA 20
24 W2D4 CA 26
25 W1D1 DE 19
26 W1D2 DE 18
27 W1D3 DE 17
28 W1D4 DE 19
29 W2D1 DE 16
30 W2D2 DE 15
31 W2D3 DE 19
32 W2D4 DE 18
33 W1D1 UK 20
34 W1D2 UK 24
35 W1D3 UK 24
36 W1D4 UK 20
37 W2D1 UK 19
38 W2D2 UK 25
39 W2D3 UK 19
40 W2D4 UK 25
What I want is a vector/column that counts the number of days in that week above the country's threshold like this:
DaysInWeekAboveQ95 <- c(1,2,3,0,4,3,0,0,2,2)
df_right <- data.frame(Week = c("W1","W2","W1","W2","W1","W2","W1","W2","W1","W2"),
DaysInWeekAboveQ95 = c(1,2,3,0,4,3,0,0,2,2))
Week DaysInWeekAboveQ95
1 W1 1
2 W2 2
3 W1 3
4 W2 0
5 W1 4
6 W2 3
7 W1 0
8 W2 0
9 W1 2
10 W2 2
The q95% vector was
q95 <- c(26,21,22,20,23)
so in the first week AL have 1 instance above its threshold value 26
. UK have 2 instances above 23 (UK's threshold) in the second week. And so for every country and every week.
I handled a similar problem but where the threshold did not vary by country but was just a constant 30 degrees (where I divide by 7 because seven days in week)
DaysAbove30perWeek <- as.data.frame(tapply(testdlong$value > 30,
ceiling(seq(nrow(testdlong))/7),sum))
Maybe a solution is to loop over countries? However, I can't figure out how to incorporate the specific loop. Other solutions are welcome.
Upvotes: 0
Views: 181
Reputation: 26218
In revised scenario you also need calculating a new column for week too
q95 <- c(26,21,22,20,23)
c_q95 <- data.frame(Country = unique(DailyTempCountry$Country),
threshold = q95)
library(dplyr)
DailyTempCountry %>% left_join(c_q95, by = 'Country') %>%
group_by(Country, Week = substr(Date, 1, 2)) %>%
summarise(days = sum(DailyTemp > threshold), .groups = 'drop')
# A tibble: 10 x 3
Country Week days
<chr> <chr> <int>
1 AL W1 1
2 AL W2 2
3 BE W1 3
4 BE W2 0
5 CA W1 4
6 CA W2 3
7 DE W1 0
8 DE W2 0
9 UK W1 2
10 UK W2 2
Created on 2021-05-05 by the reprex package (v2.0.0)
OP has asked that date variable is in some different format than given in sample data
time <- as.character(20000101:20000130)
> time
[1] "20000101" "20000102" "20000103" "20000104" "20000105" "20000106" "20000107" "20000108" "20000109" "20000110"
[11] "20000111" "20000112" "20000113" "20000114" "20000115" "20000116" "20000117" "20000118" "20000119" "20000120"
[21] "20000121" "20000122" "20000123" "20000124" "20000125" "20000126" "20000127" "20000128" "20000129" "20000130"
library(lubridate)
time <- ymd(time)
# Either ISO week
isoweek(time)
# or week
week(time)
> isoweek(time)
[1] 52 52 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4
> # or week
> week(time)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
library(lubridate) time <- ymd(time)
isoweek(time) week(time)
Upvotes: 1