Reputation: 4926
I am running a series of clustering analyses in weka and I have realized that automatizing it is the way to go if I want to get somewhere. I'll explain a bit how I am working.
I do all the pre-processing manually in R and save it as a csv file, importing it in weka and saving it again as an arff file.
I use weka's GUI, and in general I just open my data with in the arff file and go directly to the clustering tab and play around. (My experience using the CLI is limited).
I am trying to reproduce some results I've got by using the GUI, but now with commands in the CLI. The problem is that I usually ignore a list of attributes when clustering using the GUI. I cannot find a way of selecting a list of attributes to be ignored in the command line.
For example:
java weka.clusterers.XMeans \
-I 10 -M 1000 -J 1000 \
-L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
My experience with weka is limited so I don't know if I am missing some basic understanding of how it works.
Upvotes: 4
Views: 2874
Reputation: 61
To ignore an attribute you have to do it from the distance function
Ignore attributes from command line (Matlab):
COLUMNS = '3-last'; % The indices start from 1, 'first' and 'last' are valid as well. E.g .: first-3,5,6-last
Df = weka.core.EuclideanDistance (); % Setup distance function.
Df.setAttributeIndices (COLUMNS); % Setup distance function.
Ignore attributes from GUI Ignore attributes from GUI
I do not understand why when someone asks how to ignore attributes all the answers say how to modify the dataset, using a filter in the preprocess section.
Upvotes: 0
Reputation: 14721
Data Preprocessing functions are called filters. You need to use filters together with cluster algorithm. See below example.
java weka.clusterers.FilteredClusterer \
-F weka.filters.unsupervised.attribute.Remove -V -R 1,5 \
-W weka.clusterers.XMeans -I 10 -M 1000 -J 1000 -L 2 -H 9 -B 1.0 -C 0.25 \
-D "weka.core.MinkowskiDistance -R first-last" -S 10 \
-t "/home/pedrosaurio/bigtable.arff"
Here we remove attributes 1-5 then use xmeans.
Upvotes: 5