Bfu38
Bfu38

Reputation: 1141

Moving average with grouped data

I would like to calculate the Moving Average of my dataset that is composed by a column representing an index of grouped patients and a second column representing some measurements of a blood circulating molecule. Patients are grouped according to consecutive measurements of the molecule of interest.

Moreover I would like to plot the output in which the measurement per group is plotted against the patients group number.

I tried to code this analysis but I'm not sure I'm doing well.

  SURG_DATE VES_2A Index
 21/05/2013    1     1
 10/06/2013    1     1
 06/01/2014    1     1
 29/01/2014    0     1
 11/03/2014    3     2
 05/04/2014    1     2
 06/04/2014    1     2
 14/05/2014    1     2
 28/05/2014    3     3
 02/09/2014    2     3
 16/09/2014    2     3
 17/09/2014    0     3
 21/10/2014    2     5
 05/12/2014    0     5
 19/12/2014    2     5
 11/01/2015    1     5
 15/01/2015    1     6
 17/01/2015    2     6
 24/01/2015    1     6
 19/02/2015    1     6

The code I tried:

tapply(test$VES_2A, 
       test$Index, 
       function(x) rollmean(x, 12, na.pad=TRUE))

Upvotes: 0

Views: 271

Answers (2)

jay.sf
jay.sf

Reputation: 72613

It's a little ambiguous, but I think you want this:

test <- cbind(time=rownames(test), test)  # first add a time variable

# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time), 
              function(x) cbind(time=x,  # time variable
                                with(test[test$time %in% 1:x, ], 
                                     aggregate(list(VES_2A=VES_2A), 
                                               list(Index=Index), mean))  # rolling mean
                                ))

tot <- transform(t(sapply(ls1, colMeans)), Index="total")  # occasionally add a total column

long <- rbind(do.call(rbind, ls1), tot)  # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index)  # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot)  # clean up

Yielding

> wide
   time    1        2        3        5        6    total
1     1 1.00       NA       NA       NA       NA 1.000000
2     2 1.00       NA       NA       NA       NA 1.000000
3     3 1.00       NA       NA       NA       NA 1.000000
4     4 0.75       NA       NA       NA       NA 0.750000
5     5 0.75 3.000000       NA       NA       NA 1.875000
6     6 0.75 2.000000       NA       NA       NA 1.375000
7     7 0.75 1.666667       NA       NA       NA 1.208333
8     8 0.75 1.500000       NA       NA       NA 1.125000
9     9 0.75 1.500000 3.000000       NA       NA 1.750000
10   10 0.75 1.500000 2.500000       NA       NA 1.583333
11   11 0.75 1.500000 2.333333       NA       NA 1.527778
12   12 0.75 1.500000 1.750000       NA       NA 1.333333
13   13 0.75 1.500000 1.750000 2.000000       NA 1.500000
14   14 0.75 1.500000 1.750000 1.000000       NA 1.250000
15   15 0.75 1.500000 1.750000 1.333333       NA 1.333333
16   16 0.75 1.500000 1.750000 1.250000       NA 1.312500
17   17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18   18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19   19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20   20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000

Plot

library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
  geom_line()

enter image description here

Tell me what you think, hope that's what you've wanted.

Data

test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L, 
                                  2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L, 
                                                                                         1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 
                                                                                         6L, 6L)), class = "data.frame", row.names = c(NA, -20L))

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269441

Using the data shown reproducibly in the Note at the end this takes the rolling mean of the current and prior two observations separately for each Index value and adds a sequence number. Since every value of Index takes up 4 rows we use 1:4.

It's not clear from the question what is to be plotted but we plot the rolling mean vs. seq for each Index on a single panel. For classic grpahics, replace screen = 1 with screen = colnames(wide) if you want separate panels. For ggplot2 to get separate panels omit facet=NULL.

library(zoo)

roll <- function(x) rollmeanr(x, 3, fill = NA)
df3 <- transform(df, mean3 = ave(VES_2A, Index, FUN = roll), seq = 1:4)

wide <- na.omit(read.zoo(df3[-1], index = "seq", split = "Index"))

# classic graphics
plot(wide, screen = 1, type = "o", pch = colnames(wide))

# ggplot2 gtraphics
library(ggplot2)
autoplot(wide[-3], facet = NULL)

Note

Lines <- "  VES_2A Index
     1     1
     1     1
     1     1
     0     1
     3     2
     1     2
     1     2
     1     2
     3     3
     2     3
     2     3
     0     3
     2     5
     0     5
     2     5
     1     5
     1     6
     2     6
     1     6
     1     6"
df <- read.table(text = Lines, header = TRUE)

Upvotes: 1

Related Questions