Reputation: 377
I'm doing a cluster analysis with R and SAS and I have results which are really different.
I know that the results are random, so a little difference is normal, but the difference is huge.
I perform a test with the famous CARS dataset from SAS.
With R, I do that :
kmeans(CARS[,c(8,10)],5)
Result : (between_SS / total_SS = 93.2 %)
With SAS, I do that :
proc fastclus data=sashelp.cars maxclusters=5 ; var EngineSize
Horsepower ; run;
Result : Approximate Expected Over-All R-Squared = 0.96079
The difference is smaller, but there is still a difference. I perform the test few times, and the results are still the same.
Where does this difference come from ?
Upvotes: 0
Views: 1435
Reputation: 44525
Pretty sure from the documentation:
that these they rely on different algorithms. SAS documentation vaguely describes a method of "nearest centroid sorting". I don't know anything about this substantively, but perhaps look into other clustering functions (like hclust
) or other packages to find something comparable.
Upvotes: 2