Reputation: 1141
I would like to calculate the Moving Average of my dataset that is composed by a column representing an index of grouped patients and a second column representing some measurements of a blood circulating molecule. Patients are grouped according to consecutive measurements of the molecule of interest.
Moreover I would like to plot the output in which the measurement per group is plotted against the patients group number.
I tried to code this analysis but I'm not sure I'm doing well.
SURG_DATE VES_2A Index
21/05/2013 1 1
10/06/2013 1 1
06/01/2014 1 1
29/01/2014 0 1
11/03/2014 3 2
05/04/2014 1 2
06/04/2014 1 2
14/05/2014 1 2
28/05/2014 3 3
02/09/2014 2 3
16/09/2014 2 3
17/09/2014 0 3
21/10/2014 2 5
05/12/2014 0 5
19/12/2014 2 5
11/01/2015 1 5
15/01/2015 1 6
17/01/2015 2 6
24/01/2015 1 6
19/02/2015 1 6
The code I tried:
tapply(test$VES_2A,
test$Index,
function(x) rollmean(x, 12, na.pad=TRUE))
Upvotes: 0
Views: 271
Reputation: 72613
It's a little ambiguous, but I think you want this:
test <- cbind(time=rownames(test), test) # first add a time variable
# then create a list with rolling mean for each id and time
ls1 <- lapply(seq_along(test$time),
function(x) cbind(time=x, # time variable
with(test[test$time %in% 1:x, ],
aggregate(list(VES_2A=VES_2A),
list(Index=Index), mean)) # rolling mean
))
tot <- transform(t(sapply(ls1, colMeans)), Index="total") # occasionally add a total column
long <- rbind(do.call(rbind, ls1), tot) # bind all rows together into long format data frame
wide <- reshape2::dcast(long, time ~ Index) # reshape to wide w/ e.g. reshape2::dcast()
rm(ls1, tot) # clean up
Yielding
> wide
time 1 2 3 5 6 total
1 1 1.00 NA NA NA NA 1.000000
2 2 1.00 NA NA NA NA 1.000000
3 3 1.00 NA NA NA NA 1.000000
4 4 0.75 NA NA NA NA 0.750000
5 5 0.75 3.000000 NA NA NA 1.875000
6 6 0.75 2.000000 NA NA NA 1.375000
7 7 0.75 1.666667 NA NA NA 1.208333
8 8 0.75 1.500000 NA NA NA 1.125000
9 9 0.75 1.500000 3.000000 NA NA 1.750000
10 10 0.75 1.500000 2.500000 NA NA 1.583333
11 11 0.75 1.500000 2.333333 NA NA 1.527778
12 12 0.75 1.500000 1.750000 NA NA 1.333333
13 13 0.75 1.500000 1.750000 2.000000 NA 1.500000
14 14 0.75 1.500000 1.750000 1.000000 NA 1.250000
15 15 0.75 1.500000 1.750000 1.333333 NA 1.333333
16 16 0.75 1.500000 1.750000 1.250000 NA 1.312500
17 17 0.75 1.500000 1.750000 1.250000 1.000000 1.250000
18 18 0.75 1.500000 1.750000 1.250000 1.500000 1.350000
19 19 0.75 1.500000 1.750000 1.250000 1.333333 1.316667
20 20 0.75 1.500000 1.750000 1.250000 1.250000 1.300000
Plot
library(ggplot2)
ggplot(long, aes(time, VES_2A, color=Index)) +
geom_line()
Tell me what you think, hope that's what you've wanted.
Data
test <- structure(list(VES_2A = c(1L, 1L, 1L, 0L, 3L, 1L, 1L, 1L, 3L,
2L, 2L, 0L, 2L, 0L, 2L, 1L, 1L, 2L, 1L, 1L), Index = c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L)), class = "data.frame", row.names = c(NA, -20L))
Upvotes: 2
Reputation: 269441
Using the data shown reproducibly in the Note at the end this takes the rolling mean of the current and prior two observations separately for each Index value and adds a sequence number. Since every value of Index takes up 4 rows we use 1:4.
It's not clear from the question what is to be plotted but we plot the rolling mean vs. seq for each Index on a single panel. For classic grpahics, replace screen = 1
with screen = colnames(wide)
if you want separate panels. For ggplot2 to get separate panels omit facet=NULL
.
library(zoo)
roll <- function(x) rollmeanr(x, 3, fill = NA)
df3 <- transform(df, mean3 = ave(VES_2A, Index, FUN = roll), seq = 1:4)
wide <- na.omit(read.zoo(df3[-1], index = "seq", split = "Index"))
# classic graphics
plot(wide, screen = 1, type = "o", pch = colnames(wide))
# ggplot2 gtraphics
library(ggplot2)
autoplot(wide[-3], facet = NULL)
Lines <- " VES_2A Index
1 1
1 1
1 1
0 1
3 2
1 2
1 2
1 2
3 3
2 3
2 3
0 3
2 5
0 5
2 5
1 5
1 6
2 6
1 6
1 6"
df <- read.table(text = Lines, header = TRUE)
Upvotes: 1