Henrik Ode
Henrik Ode

Reputation: 191

Moving average that resets based on another binary column

I have a dataset containing blood percentage (hemoglobin level) and I am interested in calculating a backwards-looking (ie. not centred) moving average that restarts everytime the patient goes below 7 in hemoglobin level.

Pt_id represents the patient id, hemoglobin_level is the given hemoglobin, and anemia_start is the column that indicates when a given patients hemoglobin first goes below 7 (ie. when anemia_start equals 1).

Example data:

df <- data.frame(pt_id = c(1,1,1,1,1),
                hemoglobin_level = c(8,6,5,8,7),
                anemia_start = c(0,1,0,0,0))

df
  pt_id hemoglobin_level anemia_start
1     1                8            0
2     1                6            1
3     1                5            0
4     1                8            0
5     1                7            0

Expected output column is:

moving_average = c(8, 6, 5.5, 6.3, 6.5)

The moving average is restarted once anemia starts, so the second value is 6 and then the moving average continues.

I know how to create a moving average (using zoo package / slider), but i do not know how to make it restart conditionally based on the "anemia_start column".

Thanks for any help.

Further information:

My professor did this in SAS using a bunch of if statements, but I have had a hard time translating it to R.

In order to understand the expected output, here's a picture of my professor's output (made in SAS) that I would like to reproduce in R. The column I'm having a hard time reproducing is the one called hb_gennemsnit (= Hemoglobin average).

enter image description here

he has created a bunch of intermediary columns in SAS to produce his code. Its in danish but HB = hemoglobin (the one I called hemoglobin level) ptnr slut = patient number end, ptnr start = patient number start, and HB gennemsnit = hemoglobin average.

The hb_gennemsnit column is the moving average column that I am trying to reproduce in R

Upvotes: 0

Views: 101

Answers (1)

diomedesdata
diomedesdata

Reputation: 1075

Using data.table and slider:

library(data.table)
library(slider)

setDT(df)

# Helper function
adder <- function(x) {
  for (i in 1:length(x)) {

    if (x[i] == 0L) {
      if (i != 1L) {
        x[i] <- x[i-1]
      } else {x[i] <- 1}
    } else {
      x[i] <- x[i-1] + 1
    }
  }

  return(x)
}

# Create period index
df[, period := adder(anemia_start), by = pt_id]

# Do moving average
df[, moving_average := slide_vec(
  .x = hemoglobin_level
  .f = mean,
  .before = Inf),
  by = c("pt_id", "period")]

Output:

df
    pt_id hemoglobin_level anemia_start period moving_average
 1:     1                8            0      1       8.000000
 2:     1                6            1      2       6.000000
 3:     1                5            0      2       5.500000
 4:     1                8            0      2       6.333333
 5:     1                7            0      2       6.500000
 6:     2                8            0      1       8.000000
 7:     2                4            1      2       4.000000
 8:     2                3            0      2       3.500000
 9:     2                9            0      2       5.333333
10:     2                9            0      2       6.250000

OP edited the question so that there is a unique value for pt_id. In this case, you can just drop the by = pt_id everywhere, but the original solution will still work.

Upvotes: 2

Related Questions