Reputation: 89
I have a sample dataset with the columns: PATIENTID (IDs of patients), VISITNUMBER (their number of visits to the hospital), TIME (time in years since first visit), HEALTH (their health status). I am trying to plot HEALTH over time.
This is my code in R:
# data structure
PATIENTID <- c(126, 126, 126, 255, 255, 389, 389, 389, 389, 389, 470, 470, 470)
VISITNUMBER <- c(1, 2, 3, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3)
TIME<- c(0, 4, 6, 0, 3, 0, 1, 2, 3, 4, 0, 1, 2)
HEALTH <- c(0.333, 0.452, 0.468, 0.571, 0.522, 0.444, 0.452, 0.431, 0.510, 0.532, 0.214, 0.333, 0.400)
mydata <- data.frame(PATIENTID, VISITNUMBER, TIME, HEALTH)
# converting patient ID and visit number to factor
mydata$PATIENTID <- factor(mydata$PATIENTID)
mydata$VISITNUMBER <- factor(mydata$VISITNUMBER)
# creating a spagetti plot of health over time
sp_HEALTH <- ggplot(data = mydata, aes(TIME, HEALTH, group=PATIENTID))
sp_HEALTH +
geom_line() +
stat_smooth(aes(group=1), method = "lm", se = FALSE) +
stat_summary(aes(group=1), geom = "point", fun.y = mean,
shape = 17, size = 3, col = "red")
This is my plot that's generated as a result of this code:
My issue is that I am trying to figure out a way to connect the mean points (shown in red in the above link) using a blue line that goes from point to point but I get this straight regression type of line. I want it to be like how a regular line plot connects points using lines (please click link below). How do I insert a line that connects the mean points?
Thank you!
Upvotes: 0
Views: 1875
Reputation: 33772
Perhaps easier to use dplyr::mutate
to calculate the mean, then add separate geoms for patient and mean values?
library(dplyr)
library(ggplot2)
mydata %>%
mutate(PATIENTID = factor(PATIENTID)) %>%
group_by(TIME) %>%
mutate(MEAN = mean(HEALTH)) %>%
ungroup() %>%
ggplot() +
geom_line(aes(TIME, HEALTH, group = PATIENTID)) +
geom_line(aes(TIME, MEAN), color = "blue") +
geom_point(aes(TIME, MEAN), color = "red", size = 3, shape = 17)
Or you could just add a second stat_summary
with geom = "line"
. Note in both cases how aes()
is used in the geom, not the ggplot()
.
mydata %>%
ggplot() +
geom_line(aes(TIME, HEALTH, group=PATIENTID)) +
stat_summary(aes(TIME, HEALTH), geom = "point", fun = mean, shape = 17, size = 3, col = "red") +
stat_summary(aes(TIME, HEALTH), geom = "line", fun = mean, col = "blue")
Upvotes: 1