Ricol
Ricol

Reputation: 377

R and SAS : different results for clustering analysis

I'm doing a cluster analysis with R and SAS and I have results which are really different.

I know that the results are random, so a little difference is normal, but the difference is huge.

I perform a test with the famous CARS dataset from SAS.

With R, I do that :

kmeans(CARS[,c(8,10)],5)

Result : (between_SS / total_SS = 93.2 %)

With SAS, I do that :

proc fastclus data=sashelp.cars maxclusters=5 ; var EngineSize 
Horsepower ; run;

Result : Approximate Expected Over-All R-Squared = 0.96079

The difference is smaller, but there is still a difference. I perform the test few times, and the results are still the same.

Where does this difference come from ?

Upvotes: 0

Views: 1435

Answers (1)

Thomas
Thomas

Reputation: 44525

Pretty sure from the documentation:

that these they rely on different algorithms. SAS documentation vaguely describes a method of "nearest centroid sorting". I don't know anything about this substantively, but perhaps look into other clustering functions (like hclust) or other packages to find something comparable.

Upvotes: 2

Related Questions