Reputation: 741
I am trying to create a plot to compare running time of different algorithms. By running the below R code, I get the following plot, which I am generally statisfied with. However: It can be hard to read-off values from this graph, is there a way to get the plotted mean valeus for each DBMS for each instance? For example for gplus-combined
, the value of CacheDBMS
is around 50, while for BranchDBMS
it is around 200.
ggplot(dt, aes(reorder(instance, V9), V9)) +
geom_point(aes(group=V2, colour=V2), stat='summary', fun.y='mean') +
geom_line(aes(group=V2, colour=V2), stat='summary', fun.y='mean') +
scale_y_log10() +
ylab("Mean wall time") +
xlab("") +
ggtitle("Comparison of Database Management Systems") +
theme_bw() +
theme(axis.text.x = element_text(angle=45, vjust = 1, hjust = 1)) +
guides(color=guide_legend(title="DBMS"))
I want the y-values for each point. Preferably as a table, e.g.
BranchDBMS gplus-combined 213.21
CacheDBMS gplus-combined 48.68
EDIT
A small snippet (out of 10000-ish lines) of input data. I have removed unused columns, so the V*
is not correct. But V2
is the first column here, V9
is the second and instance
is the last.
BranchDBMS; 0.163352; facebook-combined
BranchDBMS; 0.169043; facebook-combined
BranchDBMS; 0.162545; facebook-combined
BranchDBMS; 0.159489; facebook-combined
BranchDBMS; 0.168414; facebook-combined
CacheDBMS ; 0.038515; facebook-combined
CacheDBMS ; 0.037179; facebook-combined
CacheDBMS ; 0.037385; facebook-combined
CacheDBMS ; 0.036514; facebook-combined
BranchDBMS; 281.149423; gplus-combined
BranchDBMS; 261.093502; gplus-combined
BranchDBMS; 258.816546; gplus-combined
CacheDBMS ; 22.442501; gplus-combined
CacheDBMS ; 22.377717; gplus-combined
CacheDBMS ; 22.469739; gplus-combined
CacheDBMS ; 22.451922; gplus-combined
Upvotes: 1
Views: 2663
Reputation: 93761
Here's an example of how to add the value labels directly to graph, using the built-in iris
data frame:
p1 = ggplot(iris, aes(Sepal.Width, Sepal.Length, colour=Species)) +
stat_summary(fun.y=mean, geom="line", alpha=0.5) +
stat_summary(fun.y=mean, geom="text", aes(label=sprintf("%1.1f", ..y..)),
size=3, show.legend=FALSE) +
guides(colour=guide_legend(override.aes = list(alpha=1, lwd=1)))
..y..
are the internally calculated means at each value of Sepal.Width
for each Species
. Because we used alpha=0.5
for the line geom, override.aes
allows us to have bolder lines in the legend.
One way to add a table of data values would be as follows:
library(gridExtra)
library(dplyr)
# Change default fontsize for the data table
mytheme <- ttheme_default(
core = list(fg_params=list(cex = 0.7)),
colhead = list(fg_params=list(cex = 0.75)),
rowhead = list(fg_params=list(cex = 0.75)))
# Create table (in this case I just show the first three values for each species)
tab = tableGrob(iris %>% group_by(Species, Sepal.Width) %>%
summarise(`Mean Sepal Length`=sprintf("%1.1f", mean(Sepal.Length))) %>%
slice(1:3), theme=mytheme, rows=NULL)
# Lay out graph and table
grid.arrange(p1, tab, ncol=1)
Upvotes: 6