Reputation: 17017
I have a data.frame with two columns and am using aq.plot
from the mvoutlier
package, to identify potential outliers in my 2-D dataset. The only problem is I'm not very happy with the "look" of the plots generated and would like to grab the data they are plotting and make the plot in some other software.
For my specific case the plot is generated by,
library('mvoutlier')
data = read.csv(fp, colClasses=c("NULL",NA,NA))
h = aq.plot(data)
the data.frame, data
looks like this:
pr tas
1 5.133207 59.24362
2 20.173075 75.81661
3 24.819054 97.31020
4 35.893467 92.11203
5 27.752425 95.70120
6 25.765618 91.14163
7 20.895360 57.30519
8 8.921513 70.31467
9 36.031261 98.24573
10 27.166213 92.79554
11 8.889431 54.48514
12 59.564447 85.69632
13 43.818336 99.36451
14 43.408963 84.23207
15 22.653269 84.89939
16 21.480331 96.18303
17 22.827370 69.97202
18 23.252464 85.08739
19 14.618731 45.30504
20 40.795519 78.56758
21 37.310456 80.30799
22 31.099105 91.31675
23 33.107472 63.07043
24 9.611930 35.62702
And the generated plot looks like this:
So my question is, how can I get the information plotted in the subplot on the top right? By information I mean the x, y coordinates and number associated with each point. It would also be great if there were a way to get the x values at which the two vertical lines are drawn.
I see that the output h
from calling the aq.plot()
command gives a boolean array stating which points are outliers (TRUE) or not (FALSE) but there appear to be no access to the underlying components of the plot.
Any help would be much appreciated.
Upvotes: 1
Views: 478
Reputation: 17621
It's all in the code for aq.plot
. Here is the specific code for the plot in the upper right:
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(x)), col = 6)
abline(v = delta, col = 5)
text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}
If you look through the code of the function aq.plot
, you will see that you can get the x coordinates and the associated observation this way:
covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x
# 22 4 6 10 21 18 15 5 14
# 0.1152036 0.2181437 0.3148553 0.3255492 0.3752751 0.4076276 0.4661830 0.5299942 0.7093746
# 9 20 3 16 2 13 17 23 12
# 0.7564636 0.7756129 0.8838616 1.0807574 1.3059546 1.4891242 1.8606975 2.9690980 3.9152682
# 8 7 1 11 19
# 4.0283820 5.0767176 7.4233298 7.9488595 10.3217389
Then the y coordinates:
(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000
You can rebuild that plot directly using the following code altered from above. Reading through this code and following along as you build the plot should help you see where to find each piece of information. Look to the abline
calls for info on the vertical lines, and you'll find the values here qchisq(0.975, df = ncol(data))
and here arw(data, covr$center, covr$cov, alpha = 0.05)$cn
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(data)), col = 6)
abline(v = qchisq(0.975, df = ncol(data)), col = 5)
text(x = qchisq(0.975, df = ncol(data)),
y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}
Upvotes: 3