Reputation: 401
When i try this method with dense vectors data it's run correctly, but with sparse vectors data throws java.lang.ArrayIndexOutOfBoundsException
. What datasource can i use to read sparse vectors data correctly?
public void runKmeans(double[][] data) {
ArrayAdapterDatabaseConnection dataArray = new ArrayAdapterDatabaseConnection(data);
ListParameterization params = new ListParameterization();
params.addParameter(AbstractDatabase.Parameterizer.DATABASE_CONNECTION_ID, dataArray);
Database db = ClassGenericsUtil.parameterizeOrAbort(StaticArrayDatabase.class, params);
db.initialize();
// Parameterization
//params = new ListParameterization();
params = new ListParameterization();
params.addParameter(KMeans.K_ID, k);
params.addParameter(KMeans.SEED_ID, 0);
// setup Algorithm
KMeansOutlierDetection<DoubleVector> kmeansAlg = ClassGenericsUtil.parameterizeOrAbort(KMeansOutlierDetection.class, params);
//testParameterizationOk(params);
// run KMEANS on database
OutlierResult result = kmeansAlg.run(db);
...
Upvotes: 2
Views: 190
Reputation: 8715
The class ArrayAdapterDatabaseConnection
can only be used for dense vectors. You must supply a square double[][]
array.
You can use FileBasedDatabaseConnection
and the ArffParser
to read sparse data. Or you can implement your own DatabaseConnection
, it is a single method only, loadData()
.
DoubleVector
is a dense data type. SparseDoubleVector
is a sparse vector type. To do this, DoubleVector
is backed using a dense double[]
array, whereas SparseDoubleVector
uses a int[]
with the nonzero dimensions, plus a double[]
with the nonzero values only.
K-means requires a fixed dimensionality to allocate the mean vectors (these will always be dense), so make sure to supply a VectorFieldTypeInformation
with the maximum dimensionality. There is a type conversion filter that simply scans you data set once, and sets the dimension accordingly.
Upvotes: 1