Reputation:
I am a developer that has been tasked with working out how previous results using SPSS were gathered, so we can repeat the process with some new data. We can't ask the person who did the original analysis because he is sadly no longer with us, so it has fallen to me to unravel what he did.
I am not a statistician and do not need to understand the principles involved. I really just need to know what menu items to navigate to.
We had a survey done, which asked a lot of questions of 10,000 people. A subset of 15 of these questions is being used for the analysis.
I know that factor analysis was done to reduce the data to 4 sets. K-means clustering was then used to find the cluster centers. This is what I'm after now.
I have worked out how to do the factor analysis to get the component score coefficient matrix that matches the data I have in my database. This was done by going to Analyze > Dimension Reduction > Factor. I then chose a fixed number of factors (4) from the "Extract" section, "Varimax" rotation from the "Rotation" section and checked the "Display factor score coefficient matrix" in the "Scores" section.
This gave data like this:
Matrix Value 1 Value 2 Value 3 Value 4 Q1 -0.0756 0.2134 -0.0245 -0.1236 Q2 ... ... ... ... Q3 ... ... ... ... ...
What I have no idea of is how to proceed with this to do the k-means clustering.
The results I have in the database look like this:
Cluster centers Value 1 Value 2 Value 3 Value 4 Value 5 FAC1_1 -0.8373 -0.5766 0.2100 1.3499 0.2940 FAC2_1 ... ... ... ... ... FAC3_1 ... ... ... ... ... FAC4_1 ... ... ... ... ...
Now, I know that k-means clustering can be done on the original data set by using Analyze > Classify > K-means Cluster, but I don't know how to reference the factor analysis I've done.
Could someone give me some insight into how to create these cluster centers using SPSS?
Upvotes: 2
Views: 5359
Reputation: 29
I have done same kind of analysis for a project of mine. First carry out the factor analysis, once you have been able to extract good amount of variance from the factor analysis try to save the factor scores (In SPSS).
For saving the factor scores go to Analyse->Dimension Reduction->Factor->Score->Save as variables.
As you save the scores there would be new variables created in the Variable view based on the number of components.
After you have been able to save the scores of the factors go to Analyse->Classify->K-Means and select the new variables (Factors Scores) enter the number of initial clusters required then OK.
Upvotes: 1
Reputation: 2109
I'm doing the same set of analyses for a project. Just for your information, two-step clustering process offered by SPSS is more robust that K-means (Punj & Stewart 1983). In K-means, how are you going to choose the K?! You can also use the clvalid package to get the optimal number of K if you insist on using K-means.
Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: review and suggestions for application. Journal of marketing research, 134-148.
Upvotes: 0
Reputation: 5417
If you have access to the system where the original work was done, look for the journal file (typically named statistics.jnl and kept in the location specified under Edit > Options > Files). If journaling was in effect with the append option, it will have all the commands the user ran.
Upvotes: 0
Reputation: 2929
In the GUI for FACTOR analysis (Analyze > Dimension Reduction > Factor), you have a sub-dialog "Scores", make sure "Save as variables" is checked.
This will save the factor scores in your data i.e. the variables FAC1_1, FAC2_1, FAC3_1, FAC4_1.
It is these variable that you then need to add as input variables in the K-means GUI.
It is better to setup your work in a syntax so if ever anyone else ever wants to replicate your work they can do so (and ideally your predecessor should have left his bread crumbs in a syntax document too. I would make every attempt to find this document if there is a remote possibility of it existing, a file of .sps file extension).
Here's how you'd set this up in syntax and what his/her workings may have looked like:
/* Replicate the factor analysis (four factors) and save the factor score variables */.
FACTOR
/VARIABLES < INPUT THE 15 VARIABLES HERE >
/MISSING LISTWISE
/ANALYSIS < INPUT THE 15 VARIABLES HERE >
/PRINT EXTRACTION ROTATION FSCORE
/FORMAT SORT BLANK(.10)
/PLOT ROTATION
/CRITERIA FACTORS(4) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/SAVE REG(ALL)
/METHOD=CORRELATION.
/* Replicate the clustering using factor scores as inputs, generating 5 segments */.
QUICK CLUSTER FAC1_1 FAC2_1 FAC3_1 FAC4_1
/MISSING=LISTWISE
/CRITERIA=CLUSTER(5) MXITER(10) CONVERGE(0)
/METHOD=KMEANS(NOUPDATE)
/SAVE CLUSTER (Seg5)
/PRINT INITIAL.
/* Check centroids match*/.
MEANS FAC1_1 FAC2_1 FAC3_1 FAC4_1 BY Seg5 /CELLS MEAN.
If you can replicate the FACTOR score variables to match exactly, then that is a good start, if the centroids do not match then, given the factor scores do match, then it can only be/most likely to be because the segment assignments are now different. Despite using the same input/methodology if the case ordering is different to previously, K-Means QUICK CLUSTER, can and will most likely yield different segment assignments due to random starting points.
I don't know any way round this but in principle these are the likely steps he/she had taken.
Upvotes: 4