Emil Krabbe
Emil Krabbe

Reputation: 55

Count number of instances above a varying threshold

I have the 0.95 percentile threshold for temperature for each country. In the example below a week is 4 days. I want to count in a new vector/single-column-dataframe how many days each individual country's temperature is over that country's threshold on a weekly basis.

The country 95% percentile temperatures are:

 q95 <- c(26,21,22,20,23)


  DailyTempCountry <- data.frame(Date = c("W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                         "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
                                          "W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4"),
                              Country = c("AL","AL", "AL", "AL","AL","AL", "AL", "AL",
                                    "BE","BE", "BE", "BE", "BE","BE", "BE", "BE",
                                    "CA","CA", "CA", "CA","CA","CA", "CA", "CA",
                                    "DE","DE", "DE", "DE","DE","DE", "DE", "DE",
                                    "UK","UK", "UK", "UK","UK","UK", "UK", "UK"),
                              DailyTemp = c(27,25,20,22,20,20,27,27,
                                            24,22,23,18,17,19,20,16,
                                             23,23,23,23,27,26,20,26,
                                            19,18,17,19,16,15,19,18,
                                             20,24,24,20,19,25,19,25))
 DailyTempCountry



Date Country DailyTemp
1  W1D1      AL        27
2  W1D2      AL        25
3  W1D3      AL        20
4  W1D4      AL        22
5  W2D1      AL        20
6  W2D2      AL        20
7  W2D3      AL        27
8  W2D4      AL        27
9  W1D1      BE        24
10 W1D2      BE        22
11 W1D3      BE        23
12 W1D4      BE        18
13 W2D1      BE        17
14 W2D2      BE        19
15 W2D3      BE        20
16 W2D4      BE        16
17 W1D1      CA        23
18 W1D2      CA        23
19 W1D3      CA        23
20 W1D4      CA        23
21 W2D1      CA        27
22 W2D2      CA        26
23 W2D3      CA        20
24 W2D4      CA        26
25 W1D1      DE        19
26 W1D2      DE        18
27 W1D3      DE        17
28 W1D4      DE        19
29 W2D1      DE        16
30 W2D2      DE        15
31 W2D3      DE        19
32 W2D4      DE        18
33 W1D1      UK        20
34 W1D2      UK        24
35 W1D3      UK        24
36 W1D4      UK        20
37 W2D1      UK        19
38 W2D2      UK        25
39 W2D3      UK        19
40 W2D4      UK        25

What I want is a vector/column that counts the number of days in that week above the country's threshold like this:

  DaysInWeekAboveQ95 <- c(1,2,3,0,4,3,0,0,2,2)
df_right <- data.frame(Week = c("W1","W2","W1","W2","W1","W2","W1","W2","W1","W2"),
            DaysInWeekAboveQ95 = c(1,2,3,0,4,3,0,0,2,2))

 Week DaysInWeekAboveQ95
1    W1                  1
2    W2                  2
3    W1                  3
4    W2                  0
5    W1                  4
6    W2                  3
7    W1                  0
8    W2                  0
9    W1                  2
10   W2                  2

The q95% vector was

q95 <- c(26,21,22,20,23)

so in the first week AL have 1 instance above its threshold value 26. UK have 2 instances above 23 (UK's threshold) in the second week. And so for every country and every week.

I handled a similar problem but where the threshold did not vary by country but was just a constant 30 degrees (where I divide by 7 because seven days in week)

DaysAbove30perWeek <- as.data.frame(tapply(testdlong$value > 30,
                                               ceiling(seq(nrow(testdlong))/7),sum))

Maybe a solution is to loop over countries? However, I can't figure out how to incorporate the specific loop. Other solutions are welcome.

Upvotes: 0

Views: 181

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

In revised scenario you also need calculating a new column for week too


q95 <- c(26,21,22,20,23)

c_q95 <- data.frame(Country = unique(DailyTempCountry$Country),
                    threshold = q95)

library(dplyr)

DailyTempCountry %>% left_join(c_q95, by = 'Country') %>%
  group_by(Country, Week = substr(Date, 1, 2)) %>%
  summarise(days = sum(DailyTemp > threshold), .groups = 'drop')

# A tibble: 10 x 3
   Country Week   days
   <chr>   <chr> <int>
 1 AL      W1        1
 2 AL      W2        2
 3 BE      W1        3
 4 BE      W2        0
 5 CA      W1        4
 6 CA      W2        3
 7 DE      W1        0
 8 DE      W2        0
 9 UK      W1        2
10 UK      W2        2

Created on 2021-05-05 by the reprex package (v2.0.0)

OP has asked that date variable is in some different format than given in sample data

time <- as.character(20000101:20000130)
> time
 [1] "20000101" "20000102" "20000103" "20000104" "20000105" "20000106" "20000107" "20000108" "20000109" "20000110"
[11] "20000111" "20000112" "20000113" "20000114" "20000115" "20000116" "20000117" "20000118" "20000119" "20000120"
[21] "20000121" "20000122" "20000123" "20000124" "20000125" "20000126" "20000127" "20000128" "20000129" "20000130"

library(lubridate)
time <- ymd(time)

# Either ISO week
isoweek(time)
# or week
week(time)

> isoweek(time)
 [1] 52 52  1  1  1  1  1  1  1  2  2  2  2  2  2  2  3  3  3  3  3  3  3  4  4  4  4  4  4  4
> # or week
> week(time)
 [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5

library(lubridate) time <- ymd(time)

isoweek(time) week(time)

Upvotes: 1

Related Questions