Reputation: 59

count total number of days temperature exceeds 30C during the last 7 days

I have a dataset from a survey, with the first column containing IDs and the second column containing interview dates formatted as yyyymmdd, ranging from January 1, 2021, to February 28, 2021. Additionally, the temperature data is spread across 59 columns, with each column labeled as Xyyyymmdd, representing each day in January and February.

Now, I need to compute the total number of days the temperature has exceeded 30 degrees Celsius over the past 7 days for each individual. It's worth noting that there won't be any temperature measurements for the first 7 days of the year due to the lack of data in my dataset for that period.

The interview date is not sequential but the temperature cols are arranges in sequential dates.

Example is given below to understand the structure of the data:

ID	Interview date	X20210101	X20210102	X20210103
1	20210101	25	31	35
2	20210102	24	26	31
3	20210201	24	26	31
4	20210202	26	21	31

I am using R.

Upvotes: 0

Answers (2)

zephryl

Reputation: 17204

Pivot to long, convert to date, then sum up rows > 30 degrees and within 0-7 days of the interview date.

library(tidyr)
library(dplyr)
library(lubridate)

dat %>%
  pivot_longer(X20210101:X20210103, names_to = "Date", values_to = "Temp", names_prefix = "X") %>%
  mutate(across(`Interview date`:Date, ymd)) %>%
  summarize(
    HotDays = sum(Temp > 30 & `Interview date` - Date <= 7 & `Interview date` >= Date),
        .by = c(ID, `Interview date`)
  )

# # A tibble: 4 × 3
#      ID `Interview date` HotDays
#   <dbl> <date>             <int>
# 1     1 2021-01-01             0
# 2     2 2021-01-02             0
# 3     3 2021-02-01             0
# 4     4 2021-02-02             0

If you want to retain all your date columns, you could merge this back to your original dataframe, ie left_join(dat, dat_hot, join_by(ID, `Interview date`).

Upvotes: 0

jay.sf

Reputation: 73437

First make Interview_date to "Date" class.

> dat$Interview_date <- as.Date(dat$Interview_date, '%Y%m%d')

Next, make a vector of your "X" dates,

> xdates <- as.Date(names(dat)[-(1:2)], 'X%Y%m%d')

multiply temperature matrix exceeding 30° by xdates lying within timeframe [Interview_date, Interview_date - 7d] and take the rowSums.

> dat$exceed_30 <- ((dat[-(1:2)] >= 30)*
+   t(mapply(\(x, y) x <= y & x >= y - 7, list(xdates), dat$Interview_date))) |> 
+   rowSums()

Gives

> dat
   ID Interview_date X20210101 X20210102 X20210103 X20210104 exceed_30
1 1.0     2021-01-01        25        31        35        29         0
2 2.0     2021-01-02        29        30        31        29         1
3 2.1     2021-01-04        29        31        31        29         2
4 3.0     2021-02-01        24        26        31        29         0
5 4.0     2021-02-02        26        21        31        29         0

Data:

> dput(dat)
structure(list(ID = c(1, 2, 2.1, 3, 4), Interview_date = c("20210101", 
"20210102", "20210104", "20210201", "20210202"), X20210101 = c(25L, 
29L, 29L, 24L, 26L), X20210102 = c(31L, 30L, 31L, 26L, 21L), 
    X20210103 = c(35L, 31L, 31L, 31L, 31L), X20210104 = c(29L, 
    29L, 29L, 29L, 29L)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 0

count total number of days temperature exceeds 30C during the last 7 days

Answers (2)

Gives

Related Questions