ImBaldingPleaseHelp
ImBaldingPleaseHelp

Reputation: 123

K-Means output does not appear as expected

I'm trying to do clustering with my data. My target is to cluster this data to identify if the type of customer is B2B or B2C with rules :

  1. if high number_of_invoice and high avg_top then it's B2B
  2. if high number_of_invoice and low avg_top then it's B2B
  3. if low number_of_invoice and high avg_top then it's B2C
  4. if low number_of_invoice and high avg_top then it's B2C

I have removed the outliers and the distribution looks like this

distribution.

and I thought it would separate simply like this

exp

This is the cluster output enter image description here

I have measured the Silhoutte Score the score is 0.677 Is there any way to achieve the separation of clusters like I expected?

Upvotes: 1

Views: 160

Answers (1)

Khalid Saifullah
Khalid Saifullah

Reputation: 795

It's unclear whether you want to classify (B2B/B2C) or perform clustering in your data as you've mixed the 2 terms up.

If you want to cluster using K-Means, then the simple answer to your question would be no, you can't achieve a cluster the way you expect as it's a stochastic process, so the algorithm randomly initializes the cluster center at the beginning and then iteratively updates the center to minimize the error rate. Even the result you get after running K-Means once (very likely) won't be the same If you run it again because of the reason mentioned above.

Note: If you're willing to classify B2B and B2C customers, then you can try some classification methods such as decision trees, however as the 2 classes seem to be totally entangled and on top of each other in your data, I think you should try out neural Networks to capture the complexity of your data.

Upvotes: 2

Related Questions