andandandand
andandandand

Reputation: 22270

Weka data load error

I want to load the data in breast-cancer-wisconsin through Weka Explorer as a C4.5 data file and I'm getting the following errors when choosing both to load C4.5 .data and C4.5 .names: enter image description here enter image description here

Any ideas?

Upvotes: 2

Views: 3381

Answers (1)

chl
chl

Reputation: 29447

It does not look like the C45 names file is correct. Try replacing breast-cancer-wisconsin.names with this one:

2, 4.
clump: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
size: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
shape: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
adhesion: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
epithelial: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nuclei: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
chromatin: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nucleoli: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
mitoses: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Note that class comes first (only labels).

Here I have removed the first column of subjects' id in the original dataset using

$ cut -d, -f2-11 breast-cancer-wisconsin.data > breast-cancer-wisconsin.data

but it is not difficult to adapt the above code.

Alternative solutions:

  1. Generate a csv file: you just need to add a header to the *.data file and rename it as *.csv. E.g., replace breast-cancer-wisconsin.data with breast-cancer-wisconsin.csv which should look like

    clump,size,shape,adhesion,epithelial,nuclei,chromatin,nucleoli,mitoses,class
    5,1,1,1,2,1,3,1,1,2
    5,4,4,5,7,10,3,2,1,2
    3,1,1,1,2,2,3,1,1,2
    6,8,8,1,3,4,3,7,1,2
    ...
    
  2. Construct directly an *.arff file by hand; that's not really complicated as there are few variables. An example file can be found here.

Upvotes: 5

Related Questions