Reputation: 818
I'm trying to parse a multivariate function across a data.frame with ddply, in order to detect multivariate outliers per group. I expect to obtain a vector or an new column containing 1 (inliers) and 0 (outliers) using the the wfinal01 value of the sign1 function of the mvoutlier package. The following code is an example of what I have tried yet, without success:
library(plyr)
library(mvoutlier)
data(coffee)
myFunc<- function(X) sign1(unclass(X), qcrit=0.975)$wfinal01
ddply(coffee, .(sort), transform, outliers=myFunc(c(Metpyr, `5-Met`, furfu)))
The following error message is returned.
Erreur dans apply(x, 2, mad) : dim(X) must have a positive length
Upvotes: 1
Views: 475
Reputation: 115392
Your problem is that c
creates a numeric vector where you want a matrix containing three columns passed. You can use cbind
to do this.
ddply(coffee, .(sort), transform, outliers=myFunc(cbind(Metpyr, `5-Met`, furfu)))
Metpyr X5.Met furfu sort outliers
1 12.50 8.51 6.20 arabica 0
2 5.33 11.80 17.80 arabica 1
3 2.56 7.16 13.67 arabica 0
4 8.59 8.40 14.39 arabica 1
5 8.22 14.86 20.35 arabica 1
6 7.73 12.23 21.02 arabica 1
7 6.07 12.60 14.25 arabica 1
8 5.88 11.19 15.39 arabica 1
9 10.34 11.90 9.81 arabica 1
10 6.26 10.49 16.90 arabica 1
11 5.47 15.04 24.87 arabica 1
12 1.39 12.76 19.51 arabica 1
13 5.10 13.42 16.93 arabica 1
14 3.72 12.65 21.35 arabica 1
15 4.33 12.72 18.47 arabica 1
16 7.38 15.00 21.58 arabica 1
17 12.13 11.68 15.59 blended 1
18 14.41 8.99 16.42 blended 1
19 8.86 6.98 8.40 blended 1
20 15.47 5.89 5.37 blended 1
21 7.55 13.74 22.26 blended 1
22 14.47 8.76 11.28 blended 1
23 11.34 12.62 14.15 blended 1
24 14.25 8.02 8.69 blended 1
25 6.85 13.38 23.83 blended 1
26 9.93 9.05 7.52 blended 1
27 8.59 14.29 18.50 blended 1
vectors have only 1 dimension, apply
requires a matrix or array with greater than 2 dimensions (hence the error)
Edit -- reference by columns
I think reference by column number is dangerous, however this is possible if you were to use data.table
data.table
will be faster and more efficient than ddply
.
library(data.table)
CD <- data.table(coffee)
CD[, outlier := sign1(.SD, qcrit = 0.975)$wfinal01,by = sort, .SDcols = 1:3]
CD
Metpyr 5-Met furfu sort outlier
1: 12.50 8.51 6.20 arabica 0
2: 5.33 11.80 17.80 arabica 1
3: 2.56 7.16 13.67 arabica 0
4: 8.59 8.40 14.39 arabica 1
5: 8.22 14.86 20.35 arabica 1
6: 7.73 12.23 21.02 arabica 1
7: 6.07 12.60 14.25 arabica 1
8: 5.88 11.19 15.39 arabica 1
9: 10.34 11.90 9.81 arabica 1
10: 6.26 10.49 16.90 arabica 1
11: 5.47 15.04 24.87 arabica 1
12: 1.39 12.76 19.51 arabica 1
13: 5.10 13.42 16.93 arabica 1
14: 3.72 12.65 21.35 arabica 1
15: 4.33 12.72 18.47 arabica 1
16: 7.38 15.00 21.58 arabica 1
17: 12.13 11.68 15.59 blended 1
18: 14.41 8.99 16.42 blended 1
19: 8.86 6.98 8.40 blended 1
20: 15.47 5.89 5.37 blended 1
21: 7.55 13.74 22.26 blended 1
22: 14.47 8.76 11.28 blended 1
23: 11.34 12.62 14.15 blended 1
24: 14.25 8.02 8.69 blended 1
25: 6.85 13.38 23.83 blended 1
26: 9.93 9.05 7.52 blended 1
27: 8.59 14.29 18.50 blended 1
Metpyr 5-Met furfu sort outlier
You could just as easily (and more explicitly) pass c('Metpyr', `5-Met`, 'furfu')
as the argument to .SDcols.
Upvotes: 3