essicolo
essicolo

Reputation: 818

How to use a multivariate function with ddply?

I'm trying to parse a multivariate function across a data.frame with ddply, in order to detect multivariate outliers per group. I expect to obtain a vector or an new column containing 1 (inliers) and 0 (outliers) using the the wfinal01 value of the sign1 function of the mvoutlier package. The following code is an example of what I have tried yet, without success:

library(plyr)
library(mvoutlier)
data(coffee)
myFunc<- function(X) sign1(unclass(X), qcrit=0.975)$wfinal01
ddply(coffee, .(sort), transform, outliers=myFunc(c(Metpyr, `5-Met`, furfu)))

The following error message is returned.

Erreur dans apply(x, 2, mad) : dim(X) must have a positive length

Upvotes: 1

Views: 475

Answers (1)

mnel
mnel

Reputation: 115392

Your problem is that c creates a numeric vector where you want a matrix containing three columns passed. You can use cbind to do this.

ddply(coffee, .(sort), transform, outliers=myFunc(cbind(Metpyr, `5-Met`, furfu)))
   Metpyr X5.Met furfu    sort outliers
1   12.50   8.51  6.20 arabica        0
2    5.33  11.80 17.80 arabica        1
3    2.56   7.16 13.67 arabica        0
4    8.59   8.40 14.39 arabica        1
5    8.22  14.86 20.35 arabica        1
6    7.73  12.23 21.02 arabica        1
7    6.07  12.60 14.25 arabica        1
8    5.88  11.19 15.39 arabica        1
9   10.34  11.90  9.81 arabica        1
10   6.26  10.49 16.90 arabica        1
11   5.47  15.04 24.87 arabica        1
12   1.39  12.76 19.51 arabica        1
13   5.10  13.42 16.93 arabica        1
14   3.72  12.65 21.35 arabica        1
15   4.33  12.72 18.47 arabica        1
16   7.38  15.00 21.58 arabica        1
17  12.13  11.68 15.59 blended        1
18  14.41   8.99 16.42 blended        1
19   8.86   6.98  8.40 blended        1
20  15.47   5.89  5.37 blended        1
21   7.55  13.74 22.26 blended        1
22  14.47   8.76 11.28 blended        1
23  11.34  12.62 14.15 blended        1
24  14.25   8.02  8.69 blended        1
25   6.85  13.38 23.83 blended        1
26   9.93   9.05  7.52 blended        1
27   8.59  14.29 18.50 blended        1

vectors have only 1 dimension, apply requires a matrix or array with greater than 2 dimensions (hence the error)


Edit -- reference by columns

I think reference by column number is dangerous, however this is possible if you were to use data.table

data.table will be faster and more efficient than ddply.

library(data.table)
CD <- data.table(coffee)

CD[, outlier := sign1(.SD, qcrit = 0.975)$wfinal01,by = sort, .SDcols = 1:3]
CD
    Metpyr 5-Met furfu    sort outlier
 1:  12.50  8.51  6.20 arabica       0
 2:   5.33 11.80 17.80 arabica       1
 3:   2.56  7.16 13.67 arabica       0
 4:   8.59  8.40 14.39 arabica       1
 5:   8.22 14.86 20.35 arabica       1
 6:   7.73 12.23 21.02 arabica       1
 7:   6.07 12.60 14.25 arabica       1
 8:   5.88 11.19 15.39 arabica       1
 9:  10.34 11.90  9.81 arabica       1
10:   6.26 10.49 16.90 arabica       1
11:   5.47 15.04 24.87 arabica       1
12:   1.39 12.76 19.51 arabica       1
13:   5.10 13.42 16.93 arabica       1
14:   3.72 12.65 21.35 arabica       1
15:   4.33 12.72 18.47 arabica       1
16:   7.38 15.00 21.58 arabica       1
17:  12.13 11.68 15.59 blended       1
18:  14.41  8.99 16.42 blended       1
19:   8.86  6.98  8.40 blended       1
20:  15.47  5.89  5.37 blended       1
21:   7.55 13.74 22.26 blended       1
22:  14.47  8.76 11.28 blended       1
23:  11.34 12.62 14.15 blended       1
24:  14.25  8.02  8.69 blended       1
25:   6.85 13.38 23.83 blended       1
26:   9.93  9.05  7.52 blended       1
27:   8.59 14.29 18.50 blended       1
    Metpyr 5-Met furfu    sort outlier

You could just as easily (and more explicitly) pass c('Metpyr', `5-Met`, 'furfu') as the argument to .SDcols.

Upvotes: 3

Related Questions