Reputation: 23
I want to make this kind of graph(reference of graph picture: post, Categorical scatter plot with mean segments using ggplot2 in R) for my data set in R studio ,however i am not getting how can i add my groups, which is more than one, to x axis and scale on Y axis.
Here is my data which is saved in Windows PC as cvs file:
GROUP A
22.51506233
21.86862564
21.20981979
21.44734764
21.45001411
19.99370003
GROUP B
18.95846367
20.99542427
20.96941566
21.49574852
21.18944359
21.88916016
19.47029114
19.50328064
GROUP C
20.76145554
19.29909134
21.62098885
26.1908226
21.95579529
20.79806519
24.57015228
22.81287003
21.68307304
GROUP D
20.89354706
20.52819443
22.62171173
21.20273018
20.35452652
20.89900398
21.66306114
19.66979218
19.77578926
19.31722832
21.89787102
20.92485237
20.60872269
19.97720909
21.31039047
21.76075363
22.42200661
22.59609222
21.5938015
22.24318123
22.26913261
21.67864227
18.97455406
21.47759438
Here are the required details:
I didn’t tried code for graph , I am just watching videos to learn R but unfortunately I didn’t get the proper code to make such a graph. The link of the graph is Categorical scatter plot with mean segments using ggplot2 in R
My data was in excel , I saved it in CVS format then I have imported in Rstudio . It stores in my R window as BCL6.DATAcvs . I read the file as below and it is one column per group, there are 4 groups and each group has different number of values such as A has 6 values, B has 8 values, C has 9 values and D has 24 values.
summary(BCL6.DATAcvs)
A B C D
Min. :19.99 Min. :18.96 Min. :19.30 Min. :18.97
1st Qu.:21.27 1st Qu.:19.50 1st Qu.:20.80 1st Qu.:20.48
Median :21.45 Median :20.98 Median :21.68 Median :21.26
Mean :21.41 Mean :20.56 Mean :22.19 Mean :21.11
3rd Qu.:21.76 3rd Qu.:21.27 3rd Qu.:22.81 3rd Qu.:21.80
Max. :22.52 Max. :21.89 Max. :26.19 Max. :22.62
NA's :18 NA's :16 NA's :15
Please guide me how i can make this graph.
Upvotes: 2
Views: 8026
Reputation: 83255
Supposing you have a group
column and a value
column, lets first reconstruct your data:
A <- data.frame(group="A", value=c(22.51506233,21.86862564,21.20981979,21.44734764,21.45001411,19.99370003))
B <- data.frame(group="B", value=c(18.95846367,20.99542427,20.96941566,21.49574852,21.18944359,21.88916016,19.47029114,19.50328064))
C <- data.frame(group="C", value=c(20.76145554,19.29909134,21.62098885,26.1908226,21.95579529,20.79806519,24.57015228,22.81287003,21.68307304))
D <- data.frame(group="D", value=c(20.89354706,20.52819443,22.62171173,21.20273018,20.35452652,20.89900398,21.66306114,19.66979218,19.77578926,19.31722832,21.89787102,20.92485237,20.60872269,19.97720909,21.31039047,21.76075363,22.42200661,22.59609222,21.5938015,22.24318123,22.26913261,21.67864227,18.97455406,21.47759438))
df <- rbind(A,B,C,D)
Now you can make a grouped scatterplot with:
library(ggplot2)
ggplot(df, aes(x=group, y=value, color=group)) +
geom_point(size=4, alpha=0.7, position=position_jitter(w=0.1, h=0)) +
stat_summary(fun.y=mean, geom="point", shape=23, color="black", aes(fill=group), size=4) +
stat_summary(fun.ymin=function(x)(mean(x)-sd(x)),
fun.ymax=function(x)(mean(x)+sd(x)),
geom="errorbar", width=0.1) +
theme_bw()
the result:
An explanation of the used parameters:
I used alpha=0.7
in combination with position=position_jitter(w=0.1, h=0)
in order to distinguish between the points. The alpha
sets the transparency and has a value between 0
(completely transparant) and 1
(non-transparant).
With position_jitter
you can change the location of the points a bit. This is done randomly within certain boundaries of the exact point. The reason for doing this that some points overlap. By using position=position_jitter()
you can make the overlapping points better visible. The boundaries are set with the w
and h
parameters. By setting h=0
in position_jitter
you assure that the change in location is only happening horizontally, the vertical location is exactly the same the actual value. In order to see the effect, run the code without the position=position_jitter(w=0.1, h=0)
part and compare it with the plot above.
The theme_bw()
sets the layout of the plot to a black/white layout instead of using a grey background.
More info about the several parts: geom_point
, stat_summary
, geom_errorbar
and theme()
. For more info about the shapes of the points, just type ?pch
in the console.
Upvotes: 5