Zraf Ker
Zraf Ker

Reputation: 3

Problems with creating a mathematical clustering model with an additive criterion in CPLEX OPL Studio

I'm trying to create a model in CPLEX OPL Studio for clustering with an additive criterion, but I have a number of errors that I don't know how to fix correctly, because I'm very bad at OPL Studio Initially there was such a loss function to calculate the deviation from the cluster center Next, I substituted the values ​​into the general loss function and as a result I get the following formula There is also a formula for calculating the center of clusters

`   // Number of clients, number of features, and number of clusters
   int n = ...; // Number of clients
   int m = ...; // Number of features
   int k = ...; // Number of clusters

   // Client data: feature values for each client
   float data[i in 1..n][j in 1..m] = ...;

   // Binary variables: x[i][c] = 1 if client i is assigned to cluster c
   dvar boolean x[1..n][1..k];

   // Variables for the center of each cluster for each feature
   dvar float mu[1..k][1..m];

   // Model
   minimize
       sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;

   // Constraints
   subject to {
       // Each client belongs to exactly one cluster
        forall(i in 1..n)
           sum(c in 1..k) x[i][c] == 1;
    
       // Definition of cluster centers
       forall(c in 1..k, j in 1..m)
            mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
    }`

I tried to write code for the following formulas, but ran into syntax problems. For example, like this: CPLEX (default) failed to parse expression: forall(c in 1..3, j in 1..4) mu[c][j] == sum(i in 1..5) (x[ i][c]*data[i][j]) / (sum(i in 1..5) x[i][c]) It might be worth adding more restrictions, but I'm a little confused

Upvotes: 0

Views: 38

Answers (1)

Alex Fleischer
Alex Fleischer

Reputation: 10062

Within CPLEX I would rather use the Constraint Programming algorithm.

using CP;



 // Number of clients, number of features, and number of clusters
   int n = 3; // Number of clients
   int m = 4; // Number of features
   int k = 2; // Number of clusters

   // Client data: feature values for each client
   float data[i in 1..n][j in 1..m] = i*j;

   // Binary variables: x[i][c] = 1 if client i is assigned to cluster c
   dvar boolean x[1..n][1..k];

   // Variables for the center of each cluster for each feature
   
   dexpr float mu[c in 1..k][j in 1..m]=
   sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];

   // Model
   minimize
       sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;

   // Constraints
   subject to {
       // Each client belongs to exactly one cluster
        forall(i in 1..n)
           sum(c in 1..k) x[i][c] == 1;
    
       // Definition of cluster centers
       forall(c in 1..k, j in 1..m)
            mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
    }

works fine

Or if you use a better formulation

using CP;



 // Number of clients, number of features, and number of clusters
   int n = 3; // Number of clients
   int m = 4; // Number of features
   int k = 2; // Number of clusters

   // Client data: feature values for each client
   float data[i in 1..n][j in 1..m] = i*j;

   // Which cluster x[i]
   dvar int x[1..n] in 1..k;

   // Variables for the center of each cluster for each feature
   
   dexpr float mu[c in 1..k][j in 1..m]=
   sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);

   // Model
   minimize
       sum(c in 1..k, i in 1..n, j in 1..m) (x[i]==c) * (data[i][j] - mu[c][j])^2;

   // Constraints
   subject to {
       
    
       // Definition of cluster centers
       forall(c in 1..k, j in 1..m)
            mu[c][j] == sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);
    }

See https://github.com/AlexFleischerParis/opltipsandtricks/blob/master/kmeans.mod

Upvotes: 0

Related Questions