Reputation: 1362
Im a little newbie with R and not familiar with PCA. My problem is, from a survey I have a list with observations from nine variables, first one is the gender of the respondents, the next five (Q51_1_c,Q51_2_c,Q51_4_c,Q51_6_c,Q51_7_c) ask about entrepreneurial issues and the others ask about future expectations (Q56_1_c, Q56_2_c, Q56_3_c). Except gender, all this variables takes values between 1 and 5. I want to make a scatter plot with two axis. First one with "entrepreneurial variables" and second axis with "future expectations variables" and then define as points in the scatter plot the position of Male and Female. My data look like this:
x <- "Q1b Q51_1_c Q51_2_c Q51_4_c Q51_6_c Q51_7_c Q56_1_c Q56_2_c Q56_3_c
3 Male 5 4 4 4 4 5 4 4
4 Female 4 3 4 4 3 3 4 3
5 Female 1 1 1 1 1 3 1 1
7 Female 2 1 1 1 1 5 1 4
8 Female 4 4 5 4 4 5 4 4
9 Female 3 3 4 4 3 3 4 4
13 Male 4 4 4 4 5 3 3 3
15 Female 3 4 4 4 4 1 1 5
16 Female 4 1 4 4 4 3 3 3
19 Female 3 2 3 3 3 3 3 3
20 Male 1 1 1 1 1 3 1 5
21 Female 3 1 1 2 1 3 3 3
26 Female 5 5 1 2 1 4 4 3
27 Female 2 1 1 1 1 1 1 1
29 Male 2 2 2 2 1 4 4 4
31 Female 3 1 1 1 1 5 2 3
34 Female 4 1 1 4 3 3 1 4
36 Female 5 1 1 4 4 5 1 2
37 Male 5 1 2 4 4 5 4 5
38 Female 3 1 1 1 1 1 1 1"
To run PCA this is my code:
x <- na.omit(x) #Jus to simplyfy
resul <- prcomp(x[,-1], scale = TRUE)
x$PC1 <- resul$x[,1] #Saving Scores PC1
x$PC2 <- resul$x[,2] #Saving Scores PC2
The result axis are like this:
biplot(resul, scale = 0)
Finally, to make the scatter plot:
x %>%
group_by(Q1b) %>%
summarise(mean_PC1 = mean(PC1),
mean_PC2 = mean(PC2)) %>%
ggplot(aes(x=mean_PC1, y=mean_PC2, colour=Q1b)) +
geom_point() +
theme_bw()
I'm not sure how about read the results... Should I accept that Females in general get higher values in the dimension of future expectations than Males. And Males get higher values in the entrepreneurial dimension?
Thanks in advance!!
Upvotes: 0
Views: 1171
Reputation: 1042
Your interpretation of the axes looks correct, i.e., PC1 is a gradient which from left to right represents decreasing "entrepreneurialness", while PC2 is a gradient which from bottom to top represents increasing future expectations (assuming that "5" in the original data means highest entrepreneurialness/expectations).
In terms of whether males and females are different, you probably need to plot more than the just the means for each group: even if males and females are truly identical in their entrepreneurialness/expectations, you'd never expect the means from two samples to sit right on top of each other on a scatter plot. To address this, you could plot the actual observations rather than their means (i.e., one point per row, coloured by gender) and see if they intermingle vs. separate in the plot space. Or, regress gender against the principal components.
Another issue is whether it's appropriate to use PCA on ordinal data - see here for discussion.
Upvotes: 1