tcek
tcek

Reputation: 51

Quantiles on Continuous Time Data

I have blood concentration versus time data for 100 subjects. I am interested in plotting the 5, 50 and 95% quantile concentration vs time curves. While i can determine the quantiles for the entire concentration range, I am unable to figure out in R how to stratify the concentration quantiles by time. Any help would be appreciated.

a<-quantile(conc~time, 0.05) 

does not work.

Upvotes: 2

Views: 794

Answers (3)

marbel
marbel

Reputation: 7714

This is another approach using data.table. I'm not sure if this is what you are looking for, but one option is to cut the time variable and convert it to 3 categories (or what you need) using cut() and then calculate the quantiles for each group.

Define your function

qt <- function(x) quantile(x, probs = c(0.05, 0.5, 0.95))

Create Data

DT <- data.table(time = sample(1:100, 100), blood_con = sample(500:1000, 100))
DT$cut_time <- cut(DT$time, right = FALSE, breaks = c(0, 30, 60, 10e5), 
                   labels = c("LOW", "MEDIUM", "HIGH"))

head(DT)

Apply qt function to all columns and group by cut_time

Q <- DT[, list(blood_con = qt(blood_con)), by = cut_time]
Q$quantile_label <- as.factor(c("5%", "50%", "95%"))

Plot

ggplot(Q, (aes(x = cut_time, y = blood_con, label = quantile_label, color = quantile_label))) + 
  geom_point(size = 4) +
  geom_text(hjust = 1.5)

Upvotes: 0

jlhoward
jlhoward

Reputation: 59365

Assuming a dataframe, df, with columns df$subject, df$time, and df$conc, then

q <- sapply(c(low=0.05,med=0.50,high=0.95),
              function(x){by(df$conc,df$time,quantile,x)})

generates a matrix, q, with columns low, med, and high containing the 5, 50, and 95% quantiles, one row for each time. Full code below.

# generate some moderately realistic data
# concentration declines exponentially over time
# rate (k) is different for each subject and distributed as N[50,10]
# measurement error is distributed as N[1, 0.2]
time    <- 1:1000
df      <- data.frame(subject=rep(1:100, each=1000),time=rep(time,100))
k       <- rnorm(100,50,10)   # rate is different for each subject
df$conc <- 5*exp(-time/k[df$subject])+rnorm(100000,1,0.2)

# generates a matrix with columns low, med, and high 
q <- sapply(c(low=0.05,med=0.50,high=0.95),
            function(x){by(df$conc,df$time,quantile,x)})
# prepend time and convert to dataframe
q <- data.frame(time,q)
# plot the results
library(reshape2)
library(ggplot2)
gg <- melt(q, id.vars="time", variable.name="quantile", value.name="conc")
ggplot(gg) + 
  geom_line(aes(x=time, y=conc, color=quantile))+
  scale_color_discrete(labels=c("5%","50%","95%"))

Upvotes: 2

crogg01
crogg01

Reputation: 2526

Ideally some data would help to make sure but this should work:

a<-by(conc,time,quantile,0.05)

If conc and time are both in data frame (call it frame1):

a<-by(frame1$conc,frame1$time,quantile,probs=c(0.05,0.5))

Upvotes: 0

Related Questions