Reputation: 18239
Below is the kind of data I have. Please don't pay attention to how the data are created, neither on whether the numbers seem realistic in whatever context. The question is only about graphics.
set.seed(12)
TrueParameter = rep(c(10,15,18), each=8)
Estimate = rep(c(rnorm(8, 10, 1), rnorm(8, 15, 0.5), rnorm(8, 18, 2)))
LowBound95 = Estimate - abs(rnorm(24, 0, 5))
HighBound95 = Estimate + abs(rnorm(24, 0, 5))
LowBound99 = LowBound95 - abs(rnorm(24, 0, 5))
HighBound99 = HighBound95 + abs(rnorm(24, 0, 5))
dt = data.frame(TrueParameter = TrueParameter, Estimate = Estimate, LowBound95 = LowBound95, HighBound95 = HighBound95, LowBound99 = LowBound99, HighBound99 = HighBound99)
TrueParameter Estimate LowBound95 HighBound95 LowBound99 HighBound99
1 10 8.519432 3.3932082 12.176699 1.2461752 14.43811
2 10 11.577169 10.2402453 14.040165 9.3276472 17.51385
3 10 9.043256 8.0477272 9.256680 7.5311749 10.45175
4 10 9.079995 8.4243818 9.643348 5.2551908 14.67984
5 10 8.002358 7.2733584 10.286494 0.9180895 19.92009
6 10 9.727704 7.9173804 19.829378 5.9976284 20.08653
7 10 9.684651 6.3147455 14.939102 3.7309665 23.94172
8 10 9.371745 -0.9884341 13.045005 -1.8782768 15.80229
9 15 14.946768 12.2416248 17.643017 12.2203346 18.17831
10 15 15.214007 9.8615466 21.785371 3.4912489 25.73099
11 15 14.611140 12.7488565 15.861334 11.7383049 17.08261
12 15 14.353059 11.9273521 15.924082 6.1050227 17.84498
13 15 14.610217 13.2362959 16.642950 13.1193988 22.48913
14 15 15.005976 12.6084131 19.978079 8.1226293 27.56944
15 15 14.923792 10.9332653 19.202634 10.0496430 19.56754
16 15 14.648268 9.6260119 15.633912 4.0574665 18.27229
17 18 20.377758 19.8528371 24.549384 17.1433928 27.17201
18 18 18.681025 12.9010601 22.914975 8.0840684 26.64948
19 18 19.013936 16.1232632 28.784463 14.2410212 34.69653
20 18 17.413390 9.4352614 28.159690 4.5118924 34.93323
21 18 18.447283 16.9047645 23.302884 12.4169675 24.36431
22 18 22.014403 19.7670733 27.739711 19.1207606 28.18712
23 18 20.023958 15.1386918 22.650961 9.9701769 23.93612
24 18 17.395082 16.4450922 18.646682 14.7336458 24.66812
The first column is the known true parameter of the data. The second column is the estimate of this true parameter and the columns three and four represent the 95% confidence interval for this estimate while the columns 5 and 6 represent the 99% confidence interval.
My question is both a question of programming and a question of design (I hope it doesn't make this post too off-topic then); How can I best display those data?
I was thinking about having all true parameters the ones below the others (whether or not they happen to take the same value). The true parameters would then be represented by a vertical line. The two confidence intervals would be drawn as horizontal lines (in two colours) with a black dot for the estimate. We could then easily see what fraction of the confidence intervals overlap the true parameter. But I welcome someone to come up with a different design! Here is a similar to display this kind of data. The differences are that the parameter is not a constant in my case and that I'd like to be able to display several confidence intervals.
I usually use ggplot2
but I welcome answers based on any R functions and package. There might actually exist packages that would be very convenient for this kind of plot.
Upvotes: 2
Views: 885
Reputation: 27398
I know you asked for convenient functions and/or packages, but anyway... here's how I usually do this in base R.
I often plot multiple confidence intervals with by varying lwd
.
For example:
plot(dt$Estimate, pch=20, ylim=range(pretty(c(dt$LowBound99, dt$HighBound99))),
xlab='', ylab='', las=1, cex.axis=0.8, cex=1.5, xaxt='n')
segments(seq_len(nrow(dt)), dt$LowBound99, y1=dt$HighBound99, lend=1)
segments(seq_len(nrow(dt)), dt$LowBound95, y1=dt$HighBound95, lwd=4, lend=1)
I think it's useful to use lend=1
for segments
so that the ends of intervals are clearly defined.
You can then overlay true parameter values as points:
points(dt$TrueParameter, pch=21, bg='white')
Or plot them as horizontal segments beneath the other elements:
plot(dt$Estimate, pch=20, ylim=range(pretty(c(dt$LowBound99, dt$HighBound99))),
xlab='', ylab='', las=1, xaxt='n',
panel.first=plot(dt$TrueParameter ~ factor(seq_len(nrow(dt))), add=TRUE,
xlab='', ylab='', axes=FALSE, border='gray70', medlwd=4))
segments(seq_len(nrow(dt)), dt$LowBound99, y1=dt$HighBound99, lend=1)
segments(seq_len(nrow(dt)), dt$LowBound95, y1=dt$HighBound95, lwd=4, lend=1)
Above, we take advantage of the horizontal median indicators of boxplots, which are deployed by default when x
is factor
and y
is numeric. (Since there is only one true value per x
, the rest of the box is not drawn.) We could use points
with pch='-'
or maybe pch=-0x2013L
here, but they are a little poorly centred around the plotting coordinate.
Upvotes: 2