Linear Regression: Finding Significant Class Variables Using SAS

Question

I'm attempting to use SAS to do a pretty basic regression problem but I'm having trouble getting the full set of results.

I'm using a data set that includes professors' overall quality (the dependent variable) and has the following independent variables: gender, numYears, pepper, discipline, easiness, and rateInterest.

I'm using the code below to generate the analysis of the data set:

proc glm data=WORK.IMPORT;
    class gender pepper discipline;
    model quality = gender numYears pepper discipline easiness raterInterest;
run;

I get the following results, which is mostly what I need, EXCEPT that I would like to see exactly which responses from the class variables (gender, pepper, discipline) are significant.

From these results, I can see that easiness, rateInterest, pepper, and discipline are significant; however, I'd like to see which specific values of pepper and discipline are significant. For example, pepper was answered as a 'yes' or 'no' by the student. I'd like to see if quality correlates specifically to pepperyes or pepperno. Can anyone give me some advice about how to alter my code to return a breakdown of the class variables?

Here is also a link to the dataset, in case it's needed for reference: https://drive.google.com/file/d/1Kc9cb_n-l7qwWRNfzXtZi5OsiY-gsYZC/view?usp=sharing Rateprof

I really, truly appreciate any assistance!

Stu Sztukowski · Accepted Answer

Add the solution option to your model statement to break out statistics of each class variable; however, reference parameterization is not available in proc glm, and will cause biased estimates. There are ways around this to continue using proc glm, but the simplest solution is to use proc glmselect instead. proc glmselect allows you to specify reference parameterization. Use the selection=none option to disable variable selection.

proc glmselect data=WORK.IMPORT;
    class gender(ref='female') pepper discipline / param=reference;
    model quality = gender numYears pepper discipline easiness raterInterest / selection=none;
run;

The interpretation of this would be:

All other variables held constant, females affect the quality rating by -0.046782 units compared to males. This variable is not statistically significant.

The breakdown of each class level is a comparison to a reference value. By default, the reference value selected is the last level after all class values are internally sorted. You can specify a reference using the ref= option after each class variable. For example, if you wanted to use females as a reference value instead of males:

proc glmselect data=WORK.IMPORT;
    class gender(ref='female') pepper discipline;
    model quality = gender numYears pepper discipline easiness raterInterest / selection=none;
run;

Note that you can also do this with prox mixed. For this specific purpose, the preference is up to you based on the output style that you like. proc mixed is a more flexible way to run regressions, but would be a bit overkill here.

proc mixed data=import;
    class gender pepper discipline;
    model quality = gender numYears pepper discipline easiness raterInterest / solution;
run;

Linear Regression: Finding Significant Class Variables Using SAS

Answers (1)

Related Questions