R train, svmRadial "Cannot scale data"

Question

I'm using R and the this breastCancer data frame. I want to use the function train in the packages caret but it doesn't work because of the error below. However, when I use another data frame, the function works.

library(mlbench)
library(caret)

data("breastCancer")
BC = na.omit(breastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")

This is the error:

error : In .local(x, ...) : Variable(s) `' constant. Cannot scale data.

StupidWolf · Accepted Answer

We can start with the data you have:

library(mlbench)
library(caret)

data(BreastCancer)
BC = na.omit(BreastCancer[,-1])

str(BC)

'data.frame':   683 obs. of  10 variables:
 $ Cl.thickness   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
 $ Cell.size      : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
 $ Cell.shape     : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
 $ Marg.adhesion  : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
 $ Epith.c.size   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
 $ Bare.nuclei    : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
 $ Bl.cromatin    : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
 $ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
 $ Mitoses        : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
 $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

BC is a data.frame and you can see all your predictors are categorical or ordinal. You are trying to do a svmRadial meaning a svm with radial basis function. It's not so trivial to calculate euclidean distance between categorical features and if you look at the distribution of your categories:

sapply(BC,table)
$Cl.thickness

  1   2   3   4   5   6   7   8   9  10 
139  50 104  79 128  33  23  44  14  69 

$Cell.size

  1   2   3   4   5   6   7   8   9  10 
373  45  52  38  30  25  19  28   6  67 

$Cell.shape

  1   2   3   4   5   6   7   8   9  10 
346  58  53  43  32  29  30  27   7  58 

$Marg.adhesion

  1   2   3   4   5   6   7   8   9  10 
393  58  58  33  23  21  13  25   4  55

When you train the model, by default it is bootstrap, some of your training data will be missing the levels that are lowly represented, for example from the above table, category 9 for Marg.adhesion. And this variable becomes all zero for this training, hence it throws the error. It most likely doesn't affect the overall result much (since they are rare).

One solution is to use cross-validation (it is unlikely you select all the rare observations in the test fold). Note, you should never convert into a matrix using as.matrix() when you have a data.frame with factors and characters. Caret can handle data.frame like this:

train(Class ~.,data=BC,method="svmRadial",trControl=trainControl(method="cv"))
Support Vector Machines with Radial Basis Function Kernel 

683 samples
  9 predictor
  2 classes: 'benign', 'malignant' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 614, 615, 615, 615, 616, 615, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa    
  0.25  0.9575654  0.9101995
  0.50  0.9619346  0.9190284
  1.00  0.9633838  0.9220161

Tuning parameter 'sigma' was held constant at a value of 0.01841092
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01841092 and C = 1.

The other option if you want to use bootstrap for cross-valiation, is to either omit the observations with these low classes, or combine them with others.

R train, svmRadial "Cannot scale data"

Answers (2)

Related Questions

R train, svmRadial &quot;Cannot scale data&quot;

Answers (2)

Related Questions

R train, svmRadial "Cannot scale data"