Reputation: 135
I am making a comparison of different algorithms with dependence on the properties of the datasets, and I am watching the execution time. Because there might exist multiple observations for one value of the property, I created a line graph, where lines would correspond to the average values of execution times. However, I also wanted to see extremes and quartiles, so my first idea was to add to the relevant places some candlesticks showing relevant values.
I expected that it should look something like this:
My data are in form of csv with relevant values in it:
size, GSP_min, GSP_firstQuartile, GSP_median, GSP_avg, GSP_thirdQuartile, GSP_max, SPAM_min, SPAM_firstQuartile, SPAM_median, SPAM_avg, SPAM_thirdQuartile, SPAM_max, PREFIX_SPAN_min, PREFIX_SPAN_firstQuartile, PREFIX_SPAN_median, PREFIX_SPAN_avg, PREFIX_SPAN_thirdQuartile, PREFIX_SPAN_max
498101.0, 101.0, 101.0, 385.6666666666667, 340.0, 716.0, 11.0, 11.0, 11.0, 33.666666666666664, 29.0, 61.0, 49.0, 49.0, 49.0, 60.333333333333336, 56.0, 76.0,
730189.0, 189.0, 189.0, 3489.0, 3740.0, 6538.0, 19.0, 19.0, 19.0, 106.66666666666667, 114.0, 187.0, 32.0, 32.0, 32.0, 69.66666666666667, 81.0, 96.0,
Here is my code and how I planned to achieve it:
set terminal png size 1024,1024
set bmargin 5
set key autotitle columnhead
set datafile separator ","
set style line 1 \
linecolor rgb '#00ff00' \
linetype 1 linewidth 2 \
pointtype 7 pointsize 1.5
set style line 2 \
linecolor rgb '#0000ff' \
linetype 1 linewidth 2 \
pointtype 7 pointsize 1.5
set style line 3 \
linecolor rgb '#ff0000' \
linetype 1 linewidth 2 \
pointtype 7 pointsize 1.5
set boxwidth 0.1 relative
set style fill empty
set output 'sizeExp.png'
plot 'size.csv' using 1:4 with lp ls 1, \
'' using 1:9 with lp ls 2, \
'' using 1:14 with lp ls 3, \
'' using ($1-1):3:2:6:5 with candlesticks whiskerbars, \
'' using ($1):8:7:11:10 with candlesticks whiskerbars, \
'' using ($1+1):13:12:16:15 with candlesticks whiskerbars
This is the generated result: The problem here is twofold:
Is there a way how to modify values relatively to image size?
And the third problem, if someone could give me some advice, I expected that line would be named "GSP_avg", "SPAM_avg", and "Prefix_span_avg", but instead, I got that mess.
Upvotes: 0
Views: 151
Reputation: 15093
I suggest that you look into the with boxplot
style, which would calculate quartiles and construct appropriate candlestick-like plots directly from the data.
Here is an online demo for gnuplot boxplots.
See also the answer provided for this earlier question: How to plot grouped boxplot by gnuplot
Unlike the with candlesticks
plot style, you can provide individual widths for the boxplots. There is also control over clustering and spacing between members of the cluster. From the documentation:
By default only one boxplot is produced that represents all y values from the
second column of the using specification. However, an additional (fourth)
column can be added to the specification. If present, the values of that
column will be interpreted as the discrete levels of a factor variable.
As many boxplots will be drawn as there are levels in the factor variable.
The separation between these boxplots is 1.0 by default, but it can be changed
by `set style boxplot separation`. By default, the value of the factor variable
is shown as a tic label below (or above) each boxplot.
Example
# Suppose that column 2 of 'data' contains either "control" or "treatment"
# The following example produces two boxplots, one for each level of the
# factor
plot 'data' using (1.0):5:(0):2
The default width of the box can be set via `set boxwidth <width>` or may be
specified as an optional 3rd column in the `using` clause of the plot command.
The first and third columns (x coordinate and width) are normally provided as
constants rather than as data columns.
Upvotes: 1
Reputation: 25724
Your boxwidth: relative to what? Your x-coordinates (column 1) are in the order of 1e5
to 1e6
.
Hence you should set the boxwidth in the order of 50000
to 100000
absolute. Check help boxwidth
.
Same for the offsets. An offset of ($1+50000)
seems to be reasonable.
Switch the key to noenhanced
mode. Check help key
.
I see another challenge: Your y-values span more than 3 orders of magnitude. It will be difficult to see them all at once. In the example below, I tried to set logscale y
, but candlesticks in logscale look strange/unusual/confusing to me. Maybe there is another way to display or group your data.
Script:
### candlesticks grouped/with offset
reset session
$Data <<EOD
size, GSP_min, GSP_firstQuartile, GSP_median, GSP_avg, GSP_thirdQuartile, GSP_max, SPAM_min, SPAM_firstQuartile, SPAM_median, SPAM_avg, SPAM_thirdQuartile, SPAM_max, PREFIX_SPAN_min, PREFIX_SPAN_firstQuartile, PREFIX_SPAN_median, PREFIX_SPAN_avg, PREFIX_SPAN_thirdQuartile, PREFIX_SPAN_max
498101.0, 101.0, 101.0, 385.6666666666667, 340.0, 716.0, 11.0, 11.0, 11.0, 33.666666666666664, 29.0, 61.0, 49.0, 49.0, 49.0, 60.333333333333336, 56.0, 76.0,
730189.0, 189.0, 189.0, 3489.0, 3740.0, 6538.0, 19.0, 19.0, 19.0, 106.66666666666667, 114.0, 187.0, 32.0, 32.0, 32.0, 69.66666666666667, 81.0, 96.0,
EOD
set datafile separator ","
set style line 1 lc rgb '#00ff00' lw 2 pt 7 ps 1.5
set style line 2 lc rgb '#0000ff' lw 2 pt 7 ps 1.5
set style line 3 lc rgb '#ff0000' lw 2 pt 7 ps 1.5
set key autotitle columnhead noenhanced top left
set style fill empty
set boxwidth 1e4
set offsets graph 0.15, graph 0.15, graph 0.1, graph 0.1
set xtics 1e5
set logscale y
plot $Data u 1:4 w lp ls 1, \
'' u 1:9 w lp ls 2, \
'' u 1:14 w lp ls 3, \
'' u ($1-5e4):3:2:6:5 w candlesticks whiskerbars, \
'' u 1:8:7:11:10 w candlesticks whiskerbars, \
'' u ($1+5e4):13:12:16:15 w candlesticks whiskerbars
### end of script
Result:
Upvotes: 1