Reputation: 91
I'm still learning R code so maybe this question is rather simple but I just can't figure it out.
I want to plot the mean scores with confidence interval from a questionnaire that was taken on three different time point: at baseline, after 4 cycles of therapy and after 8 cycles of therapy. This questionnaire contains 3 scales; sensory, motor and autonomic. So I want to plot the mean score from the three different scales per time point. So I want a line graph with on the X-axis the timepoints (at baseline; after 4 cycles; after 8 cycles) and on the Y-axis I want the scores and the graph must contain three different color lines indicating the sensory, motor and autonomic scales. I want to use ggplot.
I have a dataframe with the following columns:
This is what i'm after:
I hope someone can help me! Many thanks in advance!
Upvotes: 0
Views: 1541
Reputation: 173803
It's always a good idea to include your actual data in a question such as this, but the following should be pretty close to what you have:
set.seed(123)
df <- data.frame(ID = factor(1:60),
c0sen = rbinom(60, 15, 8.8/15),
c4sen = rbinom(60, 15, 9.2/15),
c8sen = rbinom(60, 15, 10/15),
c0mot = rbinom(60, 15, 8.1/15),
c4mot = rbinom(60, 15, 8.4/15),
c8mot = rbinom(60, 15, 8.6/15),
c0aut = rbinom(60, 15, 3/15),
c4aut = rbinom(60, 15, 3/15),
c8aut = rbinom(60, 15, 3.5/15))
head(df)
#> ID c0sen c4sen c8sen c0mot c4mot c8mot c0aut c4aut c8aut
#> 1 1 10 8 9 6 8 7 1 3 2
#> 2 2 7 12 11 9 8 13 2 3 5
#> 3 3 9 10 11 7 10 7 5 3 3
#> 4 4 7 10 11 9 8 7 2 2 4
#> 5 5 6 8 11 8 9 8 2 5 6
#> 6 6 12 9 6 8 7 9 4 3 2
Now, this is simply in the wrong format for plotting with ggplot. You first need to get the data into long format and then summarize it. Here we shape the data into appropriate columns using reshape2::melt
, then summarizing with summarize
from dplyr:
library(reshape2)
library(dplyr)
summary_df <- melt(df) %>%
mutate(time = as.numeric(substr(variable, 2, 2))) %>%
transmute(ID, time, modality = as.factor(substr(variable, 3, 5)),
score = value) %>%
group_by(modality, time) %>%
summarize(mean = mean(score),
upper = mean + 1.96 * sd(score)/sqrt(length(score)),
lower = mean - 1.96 * sd(score)/sqrt(length(score)))
This gives us something to work with:
summary_df
#> # A tibble: 9 x 5
#> # Groups: modality [3]
#> modality time mean upper lower
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 aut 0 2.93 3.35 2.52
#> 2 aut 4 2.87 3.25 2.48
#> 3 aut 8 3.45 3.89 3.01
#> 4 mot 0 7.95 8.38 7.52
#> 5 mot 4 8.48 8.99 7.98
#> 6 mot 8 8.62 9.15 8.09
#> 7 sen 0 8.7 9.18 8.22
#> 8 sen 4 9.17 9.63 8.71
#> 9 sen 8 10.1 10.5 9.70
Now we plot using a combination of geom_line
, geom_point
and geom_errorbar
:
library(ggplot2)
ggplot(summary_df, aes(x = time, y = mean, colour = modality)) +
geom_line(size = 1) +
geom_point(aes(shape = modality), size = 3) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, size = 1) +
theme_classic() +
scale_color_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
scale_shape_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
theme(legend.position = "bottom", text = element_text(size = 12)) +
labs(x = "Cycles", y = "Symptom score")
Giving us the desired result:
Created on 2020-07-02 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 13823
this is what I came up with using made-up data. Thank you for sharing the structure of your data, but in the future it is best to share the data itself, which can be done via dput(your.data.frame)
in the console, then copying/pasting the output into the question as code... or just create a dummy set using code, which is what I'm doing here.
library(tidyr)
library(dplyr)
library(ggplot2)
raw_df <- data.frame(
id=1:60,
c0sen=rnorm(60, 7, 0.2),
c4sen=rnorm(60, 8.5, 0.5),
c8sen=rnorm(60, 11, 1.2),
c0mot=rnorm(60, 6, 0.3),
c4mot=rnorm(60, 7.5, 0.5),
c8mot=rnorm(60, 9.6, 0.8),
c0aut=rnorm(60, 3, 0.1),
c4aut=rnorm(60, 2.9, 0.1),
c8aut=rnorm(60, 3.5, 0.8)
)
Before you proceed to plot, you will need to prepare the dataset for plotting with ggplot2
. Like other packages from the Tidyverse, you should prepare your data to be following Tidy Data Principles, which is what I will do here using tidyr
and dplyr
packages.
Your data arranged as is has a lot of the same information spread out into multiple columns that we need to gather()
together, but also has in each column multiple pieces of information we need to separate()
apart (time and type of measurement).
The first step is to gather the data into a "long" format, where we have a column for the measure
(c0aut, c8mot, etc etc), and a column for the score
, while maintaining the id
column. Then we need to separate that measure
column into two columns: one to describe the time
and the other to describe the type
of measurement.
df <- raw_df %>%
gather(key='measure', value='score', -id) %>%
separate(col=measure, into=c('c_time','type'), sep=2)
Finally, I'll want to fix c_time
to just give me the number, which we can do as follows:
df <- df %>% separate(c_time, into=c('c', 'time'), sep=1) %>%
select(-c)
Now, it should be noted that df$time
is actually a character vector (not a numeric value)... but that's actually okay because we want ggplot2
to treat this as if it is an ordinal factor, and not a numeric value, since on the x axis we want 0, 4, and 8 to be evenly spaced out.
Since you mentioned this is new for you, I'm going to break down the plot code into parts so that it's really easy to follow the steps taken to create the plot. First, we start with the basis, where we set the dataframe and also the common aesthetics used throughout. Note that color=
is mapped to type, but so is group=
. This is necessary so that ggplot2
knows that the data should be grouped also according to type (rather than taking the dataset as a whole). It's very important for the geoms we'll be drawing.
p <- ggplot(df, aes(x=time, y=score, color=type, group=type))
Stats and geoms.
We then draw the data on the plot area with 3 calls to stat_summary
, which draw lines, errorbars, and the points (in that order). The error bars are drawn using mean +/- standard error ("mean_se"), although other functions can certainly be used. We also have to overwrite the color=
aesthetic with the errorbar, since we want them all to be black (and not colored according to type), and we have to add the shape=
aesthetic to the points so that we can map that to type to match your plot.
p <- p + stat_summary(
geom='line', fun=mean) +
stat_summary(
geom='errorbar', fun.data=mean_se,
color='black', width=0.1) +
stat_summary(
geom='point', fun=mean, aes(shape=type))
Scales.
For the scales, I'm setting the x axis properties by renaming our "0", "4", "8"
axis, and I also set the expansion to not expand as much as the default because it looks a bit better. The scale_color
and scale_shape
calls are important to both be changed at the same time with consistency or you will loose the connection between the two scales and ggplot2
will actually show two separate scales.
type_labels <- c('Autonomic', 'Motor', 'Sensory')
p <- p + scale_x_discrete(
name=NULL, labels=c('Baseline', '4 weeks', '8 weeks'),
expand=expansion(mult=0.05)) +
scale_color_manual(name=NULL, labels=type_labels, values=rainbow(3)) +
scale_shape_discrete(name=NULL, labels=type_labels)
Theme elements.
Finally, I set the theme elements, which includes naming stuff, maintaining the overall clean look of theme_bw()
, and adding the box around the legend, which I position at the bottom.
p <- p + labs(
title='EORTC QLQ-CIPN20',
y='Symptom Score'
) +
theme_bw() +
theme(
legend.position='bottom',
legend.title=element_blank(),
legend.background = element_rect(color='black')
)
p
This all gives you the following:
Upvotes: 2