Reputation: 22270
I want to load the data in breast-cancer-wisconsin through Weka Explorer as a C4.5 data file and I'm getting the following errors when choosing both to load C4.5 .data and C4.5 .names:
Any ideas?
Upvotes: 2
Views: 3381
Reputation: 29447
It does not look like the C45 names file is correct. Try replacing breast-cancer-wisconsin.names
with this one:
2, 4.
clump: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
size: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
shape: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
adhesion: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
epithelial: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nuclei: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
chromatin: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nucleoli: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
mitoses: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Note that class comes first (only labels).
Here I have removed the first column of subjects' id in the original dataset using
$ cut -d, -f2-11 breast-cancer-wisconsin.data > breast-cancer-wisconsin.data
but it is not difficult to adapt the above code.
Alternative solutions:
Generate a csv file: you just need to add a header to the *.data
file and rename it as *.csv
. E.g., replace breast-cancer-wisconsin.data
with breast-cancer-wisconsin.csv
which should look like
clump,size,shape,adhesion,epithelial,nuclei,chromatin,nucleoli,mitoses,class
5,1,1,1,2,1,3,1,1,2
5,4,4,5,7,10,3,2,1,2
3,1,1,1,2,2,3,1,1,2
6,8,8,1,3,4,3,7,1,2
...
Construct directly an *.arff
file by hand; that's not really complicated as there are few variables. An example file can be found here.
Upvotes: 5