Reputation: 222761
I have a simple dataframe data
V1 V2 V3 V4 V5
1 3 3 3 5 6
2 3 4 6 10 12
3 5 6 8 10 11
4 4 5 7 9 11
5 2 3 5 8 9
This data represents performance during the game for each round. For example in the game 4, a person scored 7 in the third iteration.
And I am trying to create a plot like this (a plot is taken from here):
where on the x axis will rounds and on the y axis the average performance with standard deviation as bars. The average performance is for the first round will be average in the column V1 (3.4), for the second round - 4.2. Standard deviation is also calculated based on V column.
Thanks to BeasterField, I am converting my data in the following way:
df$n <- rownames(df)
df <- melt(df, id.vars="n", value.name="perf", variable.name="iter" )
dfc <- ddply(df, .(iter), summarise, se = sd( perf )/sqrt(length(perf)), perf = mean(perf))
which gives me the following result:
iter se perf
1 V1 0.5099020 3.4
2 V2 0.5830952 4.2
3 V3 0.8602325 5.8
4 V4 0.9273618 8.4
5 V5 1.0677078 9.8
But later, when I am trying to use ggplot
ggplot(dfc, aes(x=iter, y=perf))+geom_errorbar(aes(ymin=perf-se, ymax=perf+se), width=.1)+geom_line()+geom_point()
I receive : geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
The plot is build without connected lines:
Also I want my Y axis to have a maximum value of 20.
Upvotes: 2
Views: 14126
Reputation: 7113
You can melt a data.frame
from wide to long format using reshape2::melt
:
library( reshape2 )
mdf$n <- rownames(mdf)
mdf <- melt( mdf, id.vars="n", value.name="perf", variable.name="iter" )
mdf
n iter perf
1 1 R1 4
2 2 R1 2
3 3 R1 1
4 1 R2 5
5 2 R2 3
6 3 R2 1
...
Concerning your actual question
I am trying to achieve is without manipulation with the dataframe, but without any luck.
you should know, that ggplot
is designed to work on data.frames in long format. So the procedure to first melt and then plot, is absolutely usual. Sometimes there is also a split-apply-combine-step between the two, as you have indicated with summarySE
. Without knowing this function, I guess it does something similar like
library( plyr)
mdf <- ddply( mdf, .(n), summarise, se = sd( perf )/sqrt(length(perf)), perf = mean(perf))
mdf
n se perf
1 1 1.0198039 6.8
2 2 0.2000000 2.8
3 3 0.7483315 2.4
Using the plot command you showed you'll get
Upvotes: 3