Salvador Dali
Salvador Dali

Reputation: 222761

R plotting mean and standard deviation of simple dataframe without data manipulation

I have a simple dataframe data

  V1 V2 V3 V4 V5
1  3  3  3  5  6
2  3  4  6 10 12
3  5  6  8 10 11
4  4  5  7  9 11
5  2  3  5  8  9

This data represents performance during the game for each round. For example in the game 4, a person scored 7 in the third iteration.

And I am trying to create a plot like this (a plot is taken from here):

enter image description here

where on the x axis will rounds and on the y axis the average performance with standard deviation as bars. The average performance is for the first round will be average in the column V1 (3.4), for the second round - 4.2. Standard deviation is also calculated based on V column.

Thanks to BeasterField, I am converting my data in the following way:

df$n <- rownames(df)
df <- melt(df, id.vars="n", value.name="perf", variable.name="iter" )
dfc <- ddply(df, .(iter), summarise, se = sd( perf )/sqrt(length(perf)), perf = mean(perf))

which gives me the following result:

  iter        se perf
1   V1 0.5099020  3.4
2   V2 0.5830952  4.2
3   V3 0.8602325  5.8
4   V4 0.9273618  8.4
5   V5 1.0677078  9.8

But later, when I am trying to use ggplot

ggplot(dfc, aes(x=iter, y=perf))+geom_errorbar(aes(ymin=perf-se, ymax=perf+se), width=.1)+geom_line()+geom_point()

I receive : geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic? The plot is build without connected lines: enter image description here

Also I want my Y axis to have a maximum value of 20.

Upvotes: 2

Views: 14126

Answers (1)

Beasterfield
Beasterfield

Reputation: 7113

You can melt a data.frame from wide to long format using reshape2::melt:

library( reshape2 )
mdf$n <- rownames(mdf)
mdf <- melt( mdf, id.vars="n", value.name="perf", variable.name="iter" )
mdf

   n iter perf
1  1   R1    4
2  2   R1    2
3  3   R1    1
4  1   R2    5
5  2   R2    3
6  3   R2    1
...

Concerning your actual question

I am trying to achieve is without manipulation with the dataframe, but without any luck.

you should know, that ggplot is designed to work on data.frames in long format. So the procedure to first melt and then plot, is absolutely usual. Sometimes there is also a split-apply-combine-step between the two, as you have indicated with summarySE. Without knowing this function, I guess it does something similar like

library( plyr)
mdf <- ddply( mdf, .(n), summarise, se = sd( perf )/sqrt(length(perf)), perf = mean(perf)) 
mdf
  n        se perf
1 1 1.0198039  6.8
2 2 0.2000000  2.8
3 3 0.7483315  2.4

Using the plot command you showed you'll get enter image description here

Upvotes: 3

Related Questions