Reputation: 75
I'm attempting to use SAS to do a pretty basic regression problem but I'm having trouble getting the full set of results.
I'm using a data set that includes professors' overall quality (the dependent variable) and has the following independent variables: gender, numYears, pepper, discipline, easiness, and rateInterest.
I'm using the code below to generate the analysis of the data set:
proc glm data=WORK.IMPORT;
class gender pepper discipline;
model quality = gender numYears pepper discipline easiness raterInterest;
run;
I get the following results, which is mostly what I need, EXCEPT that I would like to see exactly which responses from the class variables (gender, pepper, discipline) are significant.
From these results, I can see that easiness, rateInterest, pepper, and discipline are significant; however, I'd like to see which specific values of pepper and discipline are significant. For example, pepper was answered as a 'yes' or 'no' by the student. I'd like to see if quality correlates specifically to pepperyes or pepperno. Can anyone give me some advice about how to alter my code to return a breakdown of the class variables?
Here is also a link to the dataset, in case it's needed for reference: https://drive.google.com/file/d/1Kc9cb_n-l7qwWRNfzXtZi5OsiY-gsYZC/view?usp=sharingRateprof
I really, truly appreciate any assistance!
Upvotes: 1
Views: 777
Reputation: 12909
Add the solution
option to your model
statement to break out statistics of each class variable; however, reference parameterization is not available in proc glm
, and will cause biased estimates. There are ways around this to continue using proc glm
, but the simplest solution is to use proc glmselect
instead. proc glmselect
allows you to specify reference parameterization. Use the selection=none
option to disable variable selection.
proc glmselect data=WORK.IMPORT;
class gender(ref='female') pepper discipline / param=reference;
model quality = gender numYears pepper discipline easiness raterInterest / selection=none;
run;
The interpretation of this would be:
All other variables held constant, females affect the quality rating by -0.046782 units compared to males. This variable is not statistically significant.
The breakdown of each class level is a comparison to a reference value. By default, the reference value selected is the last level after all class values are internally sorted. You can specify a reference using the ref=
option after each class variable. For example, if you wanted to use females as a reference value instead of males:
proc glmselect data=WORK.IMPORT;
class gender(ref='female') pepper discipline;
model quality = gender numYears pepper discipline easiness raterInterest / selection=none;
run;
Note that you can also do this with prox mixed
. For this specific purpose, the preference is up to you based on the output style that you like. proc mixed
is a more flexible way to run regressions, but would be a bit overkill here.
proc mixed data=import;
class gender pepper discipline;
model quality = gender numYears pepper discipline easiness raterInterest / solution;
run;
Upvotes: 2