Reputation: 149
couldn´t find a question alike, so here we go: I have a large dataset in R and I want to prepare it for hazard analysis. I thus want to create a dichotomous survival variable. However my hazard event has a relative interpretation and is not just a certain value being 0. The dataset is of the form:
ID y
1 0
1 15
1 30
1 29
1 10
2 11
2 64
2 86
2 79
2 75
plus a bunch of independent and control variables. The IDs enter the subset fit for survival analysis as y > 0. Back to the hazard variable: I want it to take on "1", as decreasing values of y fall below a threshold, which is 75% of the highest value y reaches with respect to the ID groups. Thus two conditions have to be fulfilled for the hazard:
Anyone got a solution for that? Thanks in advance?
Upvotes: 0
Views: 102
Reputation: 174468
If I understand you correctly, only the last value in group 1 should meet the conditions, since it is decreasing and less than 75% of the group's maximum. In group 2, there are no values that meet these criteria.
The tidyverse solution would look like this:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(hazard = +(y < 0.75 * max(y) & c(0, diff(y)) < 0))
#> # A tibble: 10 x 3
#> # Groups: ID [2]
#> ID y hazard
#> <int> <int> <int>
#> 1 1 0 0
#> 2 1 15 0
#> 3 1 30 0
#> 4 1 29 0
#> 5 1 10 1
#> 6 2 11 0
#> 7 2 64 0
#> 8 2 86 0
#> 9 2 79 0
#> 10 2 75 0
Data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
y = c(0L, 15L, 30L, 29L, 10L, 11L, 64L, 86L, 79L, 75L)),
class = "data.frame", row.names = c(NA, -10L))
Created on 2020-07-25 by the reprex package (v0.3.0)
Upvotes: 1