calculating medians per year per ID in R and plotting the outcome

Question

Dataset:

structure(list(ID = c(1234, 1234, 1234, 1234, 1234, 1234, 1234, 
1234, 8769, 8769, 8769, 8769, 8769, 7457, 7457, 7457, 7457, 7457, 
7457, 55667, 55667, 55667, 55667, 55667, 55667, 55667, 3789, 
3789, 3789, 3789, 3789, 3789), date_of_bloods = structure(c(978307200, 
981072000, 1173052800, 1175731200, 1367798400, 1465171200, 1467936000, 
1659916800, 1072915200, 1075680000, 1173052800, 1175731200, 1367798400, 
978307200, 981072000, 1173052800, 1175731200, 1367798400, 1465171200, 
978307200, 981072000, 1173052800, 1270425600, 1273104000, 1465171200, 
1467936000, 1270425600, 1367798400, 1465171200, 1465257600, 1465344000, 
1465430400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    result = c(90, 80, 60, 40, 25, 22, 22, 21, 70, 65, 43, 23, 
    22, 90, 90, 88, 86, 76, 74, 58, 46, 35, 34, 33, 30, 24, 76, 
    67, 56, 34, 33, 23), `mutation type` = c(1, 1, 1, 1, 1, 1, 
    1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 
    3, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -32L), class = "data.frame")

I would like the median of results per year per ID in a format where the year is just 0,1,2,3 etc for uniformity across cohorts and then to plot these lines with some indication of their mutation category.

I have done:

filtered$date_of_bloods <-format(filtered$date_of_bloods,format="%Y")
#split into individual ID groups
a <- with(filtered, split(filtered, list(ID)))

#aggregate median results per year 
medianfunc <- function(y) {aggregate(results ~ date_of_bloods, data = y, median)}
medians <- sapply(a, medianfunc)

# do lm per ID cohort and get slope of lines 
g<- as.data.frame(medians)
coefLM <- function(x) {coef(lm(date_of_bloods ~ results, data = x))}
coefs<- sapply(g, coefLM)

The actual years don't matter and for uniformity I would like them to be 0,1,2,3,4 etc per ID. I am not sure how to do that? I would then want to plot this data (median yearly bloods per ID) with some form of idea as to which mutational category they belong.

I hope this isn't too broad a question.

Many thanks

Duck · Accepted Answer

You can try this (filtered is the dput() you included). I hope this helps:

library(dplyr)
library(lubridate)
library(ggplot2)
library(broom)
#Data
filtered %>% mutate(year=year(date_of_bloods)) %>%
group_by(ID,year,`mutation type`) %>% summarise(med=median(result)) -> df1
#Variables
df1 %>% ungroup()%>% mutate(ID=as.factor(ID),
                            year=as.factor(year),
                            `mutation type`=as.factor(`mutation type`)) -> df1
#Plot
ggplot(df1,aes(x=ID,y=med,fill=`mutation type`,color=year,group=year))+
  geom_line()

And for models:

#Models
fits <- df1 %>%group_by(ID) %>% 
  do(fitmodel = lm(med ~ year, data = .))
#Coefs
dfCoef = tidy(fits, fitmodel)


# A tibble: 10 x 6
# Groups:   ID [5]
      ID term        estimate std.error statistic p.value
                           
 1  1234 (Intercept)  6329.    1546.         4.09  0.0264
 2  1234 year           -3.13     0.769     -4.07  0.0268
 3  3789 (Intercept) 14318.    4746.         3.02  0.204 
 4  3789 year           -7.08     2.36      -3.00  0.205 
 5  7457 (Intercept)  2409.     403.         5.98  0.0269
 6  7457 year           -1.16     0.201     -5.78  0.0287
 7  8769 (Intercept)  9268.    4803.         1.93  0.304 
 8  8769 year           -4.60     2.39      -1.92  0.306 
 9 55667 (Intercept)  3294.     759.         4.34  0.0492
10 55667 year           -1.62     0.378     -4.29  0.0503

Code for required plot:

#Plot 2
#Data modifications
df1 %>% mutate(year2=as.numeric(year)-1) -> df2
df2 %>% mutate(year2=factor(year2,levels = sort(unique(year2)))) -> df2
#Plot 2
ggplot(df2,aes(x=year2,y=med,color=ID,group=ID))+
  facet_wrap(.~`mutation type`)+
  geom_line()

calculating medians per year per ID in R and plotting the outcome

Answers (2)

Related Questions