Albz
Albz

Reputation: 2030

How to plot a single datapoint with mean and standard deviation from a data frame in R

I have a large dataframe in R with this format:

"SubjID"    "HR"    "IBI"   "Stimulus"  "Status"
"S1"    75.98   790 1   1
"S1"    75.95   791 1   2
"S1"    65.7    918 1   3
"S1"    59.63   100 1   4
"S1"    59.44   101 1   5
"S1"    59.62   101 2   1
"S1"    63.85   943 2   2
"S1"    60.75   992 2   3
"S1"    59.62   101 2   4
"S1"    61.68   974 2   5
"S2"    65.21   921 1   1
"S2"    59.23   101 1   2
"S2"    61.23   979 1   3
"S2"    70.8    849 1   4
"S2"    74.21   809 1   4

I would like to plot the mean of the "HR" column for each one of the values of the status column.

I wrote the following R code where I create a subset of the data (by different values of "Status") and plot it:

numberOfSeconds <- 8;

    for(stimNumber in 1:40) {

    stimulus2plot <- subset(resampledDataFile, Stimulus == stimNumber & Status <= numberOfSeconds, select=c(SubjID, HR, IBI, Stimulus, Status))

    plot(stimulus2plot$HR~stimulus2plot$Status, xlab="",ylab="")
    lines(stimulus2plot$HR~stimulus2plot$Status, xlab="",ylab="")

    }

Thus obtaining a plot similar to the following:enter image description here

I have one plot per each "Stimulus". On the X axis of each plot I have the "Status" column, on the Y I have one "HR" value for each "SubjID". Almost there...

However what I would like to obtain ultimately is a single Y datapoint per each X value. i.e. Y should be the mean value (mean of HR column), similar to the following plot:

enter image description here

How can this be achieved? It would be great having also the standard deviation shown as error bars in each datapoint.

Thanks in advance for your help.

Upvotes: 0

Views: 3627

Answers (4)

Dennis
Dennis

Reputation: 762

You can do this completely within ggplot2 as follows, using the following fake data example as a guide:

DF <- data.frame(stimulus = factor(rep(paste("Stimulus", seq(4)), each = 40)),
                 subject = factor(rep(seq(20), each = 8)),
                 time = rep(seq(8), 20),
                 resp = rnorm(160, 50, 10))
# spaghetti plots
ggplot(DF, aes(x = time, y = resp, group = subject)) +
   geom_line() +
   facet_wrap(~ stimulus, ncol = 1)
# plot of time averages by stimulus
ggplot(DF, aes(x = time, y = resp)) +
   stat_summary(fun.y = mean, geom = "line", group = 1) +
   stat_summary(fun.y = mean, geom = "point", group = 1, shape = 1) +
   facet_wrap(~ stimulus, ncol = 1)

Upvotes: 0

alexwhan
alexwhan

Reputation: 16036

To get it closest to what you want:

library(ggplot2)
library(plyr)
df.summary <- ddply(df, .(Stimulus, Status), summarise,
                    HR.mean = mean(HR),
                    HR.sd = sd(HR))
ggplot(df.summary, aes(Status, HR.mean)) + geom_path() + geom_point() + 
  geom_errorbar(aes(ymin=HR.mean-HR.sd, ymax=HR.mean+HR.sd), width=0.25) +facet_wrap(~Stimulus) 

enter image description here

Upvotes: 2

Paul Hiemstra
Paul Hiemstra

Reputation: 60964

Easiest what you can do is first precompute the values, and then plot them. I would use ddply for this kind of analysis:

library(plyr)
res = ddply(df, .(Status), summarise, mn = mean(HR))

and plot it using ggplot2:

ggplot(res, aes(x = Status, y = mn)) + geom_line() + geom_point()

Upvotes: 2

Theodore Lytras
Theodore Lytras

Reputation: 3965

The simplest way to do it would be tapply(). If your data.frame is data:

means <- with(data, tapply(HR, Status, mean))
plot(means, type="l")

It is easy to calculate and plot the error bars as well:

serr <- with(data, tapply(HR, Status, function(x)sd(x)/sqrt(length(x))))
plot(means, type="o", ylim=c(50,80))
sapply(1:length(serr), function(i) lines(rep(i,2), c(means[i]+serr[i], means[i]-serr[i])))

Upvotes: 2

Related Questions