Reputation: 323
I am working on an online class that gives the following function to find outliers via an R package:
if SCRIPT_REAL(
"library(mvoutlier);
sign2(cbind(.arg1))$wfinal01",
sum([Profit])) = 0
then "Outlier"
else "Normal"
end
The class does not do a very good job of explaining what is going on here though.
I understand that their are two arguments for this function: string, and expression.
The string is, I believe, the R code. So the mvoutlier library is loaded, a semicolon tells Tableau to break to a new command and sign2 is a function within the mvoutlier package that will find outliers in large sets of data.
I am thinking that cbind(.arg1) is communicating to Tableau that it should use the current data set in use along with the variable wfinal01.
The second argument than aggregates the sum of profit.
So a couple of questions:
The class is just using the store data that comes with Tableau as an example file.
Upvotes: 0
Views: 1119
Reputation: 1274
Check out the documentation (the 'manual' link) for the mvoutlier::sign2() function. It takes a dataset (matrix
or data frame
, in R-speak) composed of only numeric
variables, and returns a list of three outputs. One of those outputs is called wfinal01
, and it's a vector of 1s and 0s, where a value of 0 means the corresponding row in the dataset contains potential multivariate outliers.
cbind()
is an R function (not associated with the mvoutlier
package) that takes several column vectors as arguments, separated by commas, and combines them into a "data frame", which outside of R means a table or matrix.
SCRIPT_REAL()
in this case takes two arguments. First, the R code in quotes. Second, the sum(profit)
variable.
So what's happening is this:
SCRIPT_REAL()
plugs the variable sum(profit)
(loaded somewhere in Tableau, presumably) into the following function in R: sign2(cbind(.arg1))
. So .arg1
gets replaced with sum(profit)
.
R identifies potential multivariate outliers in sum(profit)
(or whatever is substituted in for .arg1
) and returns (among other things) a vector of 1s and 0s called wfinal01
.
if()
checks for cases where wfinal01
= 0 and outputs accordingly.
The only problem is this: The R code doesn't run! If sign2()
is fed anything other than a table (data frame) or matrix, it spits out an error: Error in svd(xs) : infinite or missing values in 'x'
. In your case, I presume sum(profit)
is just a single variable, a single column vector, which means sign2()
is fed a single column and thus spits back this same error.
I can only guess that the unnecessary cbind()
function was put in there in order to try to trick sign2()
into thinking it's being fed a matrix instead of a single vector. However, this doesn't solve the problem when I try it in R by myself with dummy data.
The error arises because sign2()
checks for multivariate outliers. It doesn't make sense to look for those when you only have a single variable.
Does this code run for you in Tableau with no errors? Here's a reproducible example in R for posterity.
library(mvoutlier)
#> Loading required package: sgeostat
#> sROC 0.1-2 loaded
x = sign2(iris[1:4])
x$wfinal01
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
#> [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [106] 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
#> [141] 1 1 1 1 1 1 1 1 1 1
sign2(cbind(iris$Sepal.Length))
#> Error in svd(xs): infinite or missing values in 'x'
Upvotes: 2