user2581695
user2581695

Reputation: 21

Clustering data based on relationship patterns between independent variable and dependent variable(s)

I am interested in clustering a 2-dimensional input data having a 1-D output based on the relationship between the dependent variable and independent variables. For example, if the 2-independent dimensions are x,y and the dependent variable is z and the relationship between (x,y) and z is different at different regions in the xy-space; I would like to cluster the data such that regions in xy-space that exhibit the same functional relationship with z fall into one-cluster. The functional relationships that can exist between the xy-space and z are unknown apriori.

It would be great if someone can provide me directions/references of what machine learning techniques that are out there that can be used as is or modified to fit this problem.

Upvotes: 1

Views: 1302

Answers (1)

lejlot
lejlot

Reputation: 66835

There is no good answer for this question, as this is the core concept of the whole field of hybridization between clustering and classification techniques. As a result dozens of approaches have been proposed ranging from clustering the initial data (whole XYZ space in your case) through independent analysis of possible behaviour of classification models in each cluster to the full merging of both processes in one big optimization problem. In my opinion it is almost as wide as asking "I have a data in form of (x,f(x)) and want to reconstruct "f", how do I do it?"

So references would be googling for anything related to clustering and classification hybrids, as the problem you are asking about is equivalent of finding a good clustering for modeling the (partially) independent classification/regression tasks.

Of course if you know something about the form of this functional relationship, then the whole problem can be quite easy to solve. For example if you know that your functional relationship is more or less a gaussian function you could simply fit some gaussian mixture model to your data. And in general EM (expectation maximization) would be a good choice given some knowledge about the function.

Upvotes: 3

Related Questions