nima
nima

Reputation: 19

R train, svmRadial "Cannot scale data"

I'm using R and the this breastCancer data frame. I want to use the function train in the packages caret but it doesn't work because of the error below. However, when I use another data frame, the function works.

library(mlbench)
library(caret)

data("breastCancer")
BC = na.omit(breastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")

This is the error:

error : In .local(x, ...) : Variable(s) `' constant. Cannot scale data.

Upvotes: 0

Views: 2071

Answers (2)

StupidWolf
StupidWolf

Reputation: 46968

We can start with the data you have:

library(mlbench)
library(caret)

data(BreastCancer)
BC = na.omit(BreastCancer[,-1])

str(BC)

'data.frame':   683 obs. of  10 variables:
 $ Cl.thickness   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
 $ Cell.size      : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
 $ Cell.shape     : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
 $ Marg.adhesion  : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
 $ Epith.c.size   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
 $ Bare.nuclei    : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
 $ Bl.cromatin    : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
 $ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
 $ Mitoses        : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
 $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...

BC is a data.frame and you can see all your predictors are categorical or ordinal. You are trying to do a svmRadial meaning a svm with radial basis function. It's not so trivial to calculate euclidean distance between categorical features and if you look at the distribution of your categories:

sapply(BC,table)
$Cl.thickness

  1   2   3   4   5   6   7   8   9  10 
139  50 104  79 128  33  23  44  14  69 

$Cell.size

  1   2   3   4   5   6   7   8   9  10 
373  45  52  38  30  25  19  28   6  67 

$Cell.shape

  1   2   3   4   5   6   7   8   9  10 
346  58  53  43  32  29  30  27   7  58 

$Marg.adhesion

  1   2   3   4   5   6   7   8   9  10 
393  58  58  33  23  21  13  25   4  55 

When you train the model, by default it is bootstrap, some of your training data will be missing the levels that are lowly represented, for example from the above table, category 9 for Marg.adhesion. And this variable becomes all zero for this training, hence it throws the error. It most likely doesn't affect the overall result much (since they are rare).

One solution is to use cross-validation (it is unlikely you select all the rare observations in the test fold). Note, you should never convert into a matrix using as.matrix() when you have a data.frame with factors and characters. Caret can handle data.frame like this:

train(Class ~.,data=BC,method="svmRadial",trControl=trainControl(method="cv"))
Support Vector Machines with Radial Basis Function Kernel 

683 samples
  9 predictor
  2 classes: 'benign', 'malignant' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 614, 615, 615, 615, 616, 615, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa    
  0.25  0.9575654  0.9101995
  0.50  0.9619346  0.9190284
  1.00  0.9633838  0.9220161

Tuning parameter 'sigma' was held constant at a value of 0.01841092
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.01841092 and C = 1.

The other option if you want to use bootstrap for cross-valiation, is to either omit the observations with these low classes, or combine them with others.

Upvotes: 1

UseR10085
UseR10085

Reputation: 8198

Your code contains some typos like the package name is caret not caren and dataset name is BreastCancer not breastCancer. You can use the following code to get rid of errors

library(mlbench)
library(caret)

data(BreastCancer)
BC = na.omit(BreastCancer[,-1])
a = train(Class~., data = as.matrix(BC), method = "svmRadial")

It returns me

#> Support Vector Machines with Radial Basis Function Kernel 
#> 
#> 683 samples
#>   9 predictor
#>   2 classes: 'benign', 'malignant' 
#> 
#> No pre-processing
#> Resampling: Bootstrapped (25 reps) 
#> Summary of sample sizes: 683, 683, 683, 683, 683, 683, ... 
#> Resampling results across tuning parameters:
#> 
#>   C     Accuracy   Kappa    
#>   0.25  0.9550137  0.9034390
#>   0.50  0.9585504  0.9107666
#>   1.00  0.9611485  0.9161541
#> 
#> Tuning parameter 'sigma' was held constant at a value of 0.02349173
#> Accuracy was used to select the optimal model using the largest value.
#> The final values used for the model were sigma = 0.02349173 and C = 1.

Upvotes: 0

Related Questions