Ryan
Ryan

Reputation: 11

10-fold cross validation averages in r

I have a data frame (http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data).

str(df) yields...

'data.frame':   699 obs. of  11 variables:
 $ Code #                     : int  1000025 1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 ...
 $ Clump Thickness            : int  5 5 3 6 4 8 1 2 2 4 ...
 $ Uniformity of Cell Size    : int  1 4 1 8 1 10 1 1 1 2 ...
 $ Uniformity of Cell Shape   : int  1 4 1 8 1 10 1 2 1 1 ...
 $ Marginal Adhesion          : int  1 5 1 1 3 8 1 1 1 1 ...
 $ Single Epithelial Cell Size: int  2 7 2 3 2 7 2 2 2 2 ...
 $ Bare Nuclei                : int  1 10 2 4 1 10 10 1 1 1 ...
 $ Bland Chromatin            : int  3 3 3 3 3 9 3 3 1 2 ...
 $ Normal Nucleoli            : int  1 2 1 7 1 7 1 1 1 1 ...
 $ Mitoses                    : int  1 1 1 1 1 1 1 1 5 1 ...
 $ Class                      : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...

I am attempting to perform a 10-cross validation on predicting the "Class" - Factor of 2 is benign, 4 is malignant.

I have already split the data frame into 10 test classes, and am using the predict() function with Naive Bayes classification to find the a-priori probabilities for each test class.

predict(nb, a, type = c("raw"))
nb = naive bayes classifier, a = first test class

Here are the first few values from the prediction for reference:

                  2            4
  [1,]  1.000000e+00 3.671148e-09
  [2,]  1.390736e-19 1.000000e+00
  [3,]  1.000000e+00 1.238558e-09
  [4,]  1.459450e-24 1.000000e+00
  [5,]  1.000000e+00 9.585543e-09
  [6,]  2.451592e-75 1.000000e+00
  [7,]  1.379640e-03 9.986204e-01
  [8,]  1.000000e+00 7.171687e-10

I am having trouble trying to find the average values for the a-priori probabilities for both the Benign(2) and Malignant(4) classes. How can I average those columns and print the values?

Upvotes: 1

Views: 331

Answers (2)

hpesoj626
hpesoj626

Reputation: 3619

There is a pretty useful R function for that, called colMeans. Assuming that the result is stored in the object res, then

colMeans(res)

will give you the desired column means.

Upvotes: 1

SunLisa
SunLisa

Reputation: 134

Assuming your output is a matrix simply

> b<-predict(nb, a, type = c("raw"))
> mean(b[,1])
[1] 0.5001725
> mean(b[,2])
[1] 0.4998276

just assign your output to a variable, and use [,i] to select ith column

Hope this helps

Upvotes: 0

Related Questions