Reputation: 11
I have a data frame (http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data).
str(df) yields...
'data.frame': 699 obs. of 11 variables:
$ Code # : int 1000025 1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 ...
$ Clump Thickness : int 5 5 3 6 4 8 1 2 2 4 ...
$ Uniformity of Cell Size : int 1 4 1 8 1 10 1 1 1 2 ...
$ Uniformity of Cell Shape : int 1 4 1 8 1 10 1 2 1 1 ...
$ Marginal Adhesion : int 1 5 1 1 3 8 1 1 1 1 ...
$ Single Epithelial Cell Size: int 2 7 2 3 2 7 2 2 2 2 ...
$ Bare Nuclei : int 1 10 2 4 1 10 10 1 1 1 ...
$ Bland Chromatin : int 3 3 3 3 3 9 3 3 1 2 ...
$ Normal Nucleoli : int 1 2 1 7 1 7 1 1 1 1 ...
$ Mitoses : int 1 1 1 1 1 1 1 1 5 1 ...
$ Class : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...
I am attempting to perform a 10-cross validation on predicting the "Class" - Factor of 2 is benign, 4 is malignant.
I have already split the data frame into 10 test classes, and am using the predict() function with Naive Bayes classification to find the a-priori probabilities for each test class.
predict(nb, a, type = c("raw"))
nb = naive bayes classifier, a = first test class
Here are the first few values from the prediction for reference:
2 4
[1,] 1.000000e+00 3.671148e-09
[2,] 1.390736e-19 1.000000e+00
[3,] 1.000000e+00 1.238558e-09
[4,] 1.459450e-24 1.000000e+00
[5,] 1.000000e+00 9.585543e-09
[6,] 2.451592e-75 1.000000e+00
[7,] 1.379640e-03 9.986204e-01
[8,] 1.000000e+00 7.171687e-10
I am having trouble trying to find the average values for the a-priori probabilities for both the Benign(2) and Malignant(4) classes. How can I average those columns and print the values?
Upvotes: 1
Views: 331
Reputation: 3619
There is a pretty useful R function for that, called colMeans
. Assuming that the result is stored in the object res
, then
colMeans(res)
will give you the desired column means.
Upvotes: 1
Reputation: 134
Assuming your output is a matrix simply
> b<-predict(nb, a, type = c("raw"))
> mean(b[,1])
[1] 0.5001725
> mean(b[,2])
[1] 0.4998276
just assign your output to a variable, and use [,i]
to select i
th column
Hope this helps
Upvotes: 0