Reputation: 59
I have a dataset from a survey, with the first column containing IDs and the second column containing interview dates formatted as yyyymmdd, ranging from January 1, 2021, to February 28, 2021. Additionally, the temperature data is spread across 59 columns, with each column labeled as Xyyyymmdd, representing each day in January and February.
Now, I need to compute the total number of days the temperature has exceeded 30 degrees Celsius over the past 7 days for each individual. It's worth noting that there won't be any temperature measurements for the first 7 days of the year due to the lack of data in my dataset for that period.
The interview date is not sequential but the temperature cols are arranges in sequential dates.
Example is given below to understand the structure of the data:
ID | Interview date | X20210101 | X20210102 | X20210103 |
---|---|---|---|---|
1 | 20210101 | 25 | 31 | 35 |
2 | 20210102 | 24 | 26 | 31 |
3 | 20210201 | 24 | 26 | 31 |
4 | 20210202 | 26 | 21 | 31 |
I am using R.
Upvotes: 0
Views: 68
Reputation: 17204
Pivot to long, convert to date, then sum up rows > 30 degrees and within 0-7 days of the interview date.
library(tidyr)
library(dplyr)
library(lubridate)
dat %>%
pivot_longer(X20210101:X20210103, names_to = "Date", values_to = "Temp", names_prefix = "X") %>%
mutate(across(`Interview date`:Date, ymd)) %>%
summarize(
HotDays = sum(Temp > 30 & `Interview date` - Date <= 7 & `Interview date` >= Date),
.by = c(ID, `Interview date`)
)
# # A tibble: 4 × 3
# ID `Interview date` HotDays
# <dbl> <date> <int>
# 1 1 2021-01-01 0
# 2 2 2021-01-02 0
# 3 3 2021-02-01 0
# 4 4 2021-02-02 0
If you want to retain all your date columns, you could merge this back to your original dataframe, ie left_join(dat, dat_hot, join_by(ID, `Interview date`)
.
Upvotes: 0
Reputation: 73437
First make Interview_date to "Date"
class.
> dat$Interview_date <- as.Date(dat$Interview_date, '%Y%m%d')
Next, make a vector of your "X" dates,
> xdates <- as.Date(names(dat)[-(1:2)], 'X%Y%m%d')
multiply temperature matrix exceeding 30° by xdates
lying within timeframe [Interview_date, Interview_date - 7d] and take the rowSums
.
> dat$exceed_30 <- ((dat[-(1:2)] >= 30)*
+ t(mapply(\(x, y) x <= y & x >= y - 7, list(xdates), dat$Interview_date))) |>
+ rowSums()
> dat
ID Interview_date X20210101 X20210102 X20210103 X20210104 exceed_30
1 1.0 2021-01-01 25 31 35 29 0
2 2.0 2021-01-02 29 30 31 29 1
3 2.1 2021-01-04 29 31 31 29 2
4 3.0 2021-02-01 24 26 31 29 0
5 4.0 2021-02-02 26 21 31 29 0
Data:
> dput(dat)
structure(list(ID = c(1, 2, 2.1, 3, 4), Interview_date = c("20210101",
"20210102", "20210104", "20210201", "20210202"), X20210101 = c(25L,
29L, 29L, 24L, 26L), X20210102 = c(31L, 30L, 31L, 26L, 21L),
X20210103 = c(35L, 31L, 31L, 31L, 31L), X20210104 = c(29L,
29L, 29L, 29L, 29L)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 0