Reputation: 191
I have a dataset containing blood percentage (hemoglobin level) and I am interested in calculating a backwards-looking (ie. not centred) moving average that restarts everytime the patient goes below 7 in hemoglobin level.
Pt_id
represents the patient id, hemoglobin_level
is the given hemoglobin, and anemia_start
is the column that indicates when a given patients hemoglobin first goes below 7 (ie. when anemia_start
equals 1).
Example data:
df <- data.frame(pt_id = c(1,1,1,1,1),
hemoglobin_level = c(8,6,5,8,7),
anemia_start = c(0,1,0,0,0))
df
pt_id hemoglobin_level anemia_start
1 1 8 0
2 1 6 1
3 1 5 0
4 1 8 0
5 1 7 0
Expected output column is:
moving_average = c(8, 6, 5.5, 6.3, 6.5)
The moving average is restarted once anemia starts, so the second value is 6 and then the moving average continues.
I know how to create a moving average (using zoo package / slider), but i do not know how to make it restart conditionally based on the "anemia_start column".
Thanks for any help.
My professor did this in SAS using a bunch of if statements, but I have had a hard time translating it to R.
In order to understand the expected output, here's a picture of my professor's output (made in SAS) that I would like to reproduce in R. The column I'm having a hard time reproducing is the one called hb_gennemsnit
(= Hemoglobin average).
he has created a bunch of intermediary columns in SAS to produce his code. Its in danish but HB = hemoglobin (the one I called hemoglobin level)
ptnr slut
= patient number end, ptnr start
= patient number start, and HB gennemsnit
= hemoglobin average.
The hb_gennemsnit
column is the moving average column that I am trying to reproduce in R
Upvotes: 0
Views: 101
Reputation: 1075
Using data.table
and slider
:
library(data.table)
library(slider)
setDT(df)
# Helper function
adder <- function(x) {
for (i in 1:length(x)) {
if (x[i] == 0L) {
if (i != 1L) {
x[i] <- x[i-1]
} else {x[i] <- 1}
} else {
x[i] <- x[i-1] + 1
}
}
return(x)
}
# Create period index
df[, period := adder(anemia_start), by = pt_id]
# Do moving average
df[, moving_average := slide_vec(
.x = hemoglobin_level
.f = mean,
.before = Inf),
by = c("pt_id", "period")]
Output:
df
pt_id hemoglobin_level anemia_start period moving_average
1: 1 8 0 1 8.000000
2: 1 6 1 2 6.000000
3: 1 5 0 2 5.500000
4: 1 8 0 2 6.333333
5: 1 7 0 2 6.500000
6: 2 8 0 1 8.000000
7: 2 4 1 2 4.000000
8: 2 3 0 2 3.500000
9: 2 9 0 2 5.333333
10: 2 9 0 2 6.250000
OP edited the question so that there is a unique value for pt_id
. In this case, you can just drop the by = pt_id
everywhere, but the original solution will still work.
Upvotes: 2