Reputation: 123
I'm trying to do clustering with my data. My target is to cluster this data to identify if the type of customer is B2B
or B2C
with rules :
number_of_invoice
and high avg_top
then it's B2B
number_of_invoice
and low avg_top
then it's B2B
number_of_invoice
and high avg_top
then it's B2C
number_of_invoice
and high avg_top
then it's B2C
I have removed the outliers and the distribution looks like this
and I thought it would separate simply like this
I have measured the Silhoutte Score
the score is 0.677 Is there any way to achieve the separation of clusters like I expected?
Upvotes: 1
Views: 160
Reputation: 795
It's unclear whether you want to classify (B2B/B2C) or perform clustering in your data as you've mixed the 2 terms up.
If you want to cluster using K-Means, then the simple answer to your question would be no, you can't achieve a cluster the way you expect as it's a stochastic process, so the algorithm randomly initializes the cluster center at the beginning and then iteratively updates the center to minimize the error rate. Even the result you get after running K-Means once (very likely) won't be the same If you run it again because of the reason mentioned above.
Note: If you're willing to classify B2B and B2C customers, then you can try some classification methods such as decision trees, however as the 2 classes seem to be totally entangled and on top of each other in your data, I think you should try out neural Networks to capture the complexity of your data.
Upvotes: 2