Reputation: 11639
Say I have a dataset called wage that looks like this:
wage
# A tibble: 935 x 17
wage hours iq kww educ exper tenure age married black south urban sibs brthord meduc
<int> <int> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <fctr> <fctr> <int> <int> <int>
1 769 40 93 35 12 11 2 31 1 0 0 1 1 2 8
2 808 50 119 41 18 11 16 37 1 0 0 1 1 NA 14
3 825 40 108 46 14 11 9 33 1 0 0 1 1 2 14
4 650 40 96 32 12 13 7 32 1 0 0 1 4 3 12
5 562 40 74 27 11 14 5 34 1 0 0 1 10 6 6
6 1400 40 116 43 16 14 2 35 1 1 0 1 1 2 8
7 600 40 91 24 10 13 0 30 0 0 0 1 1 2 8
8 1081 40 114 50 18 8 14 38 1 0 0 1 2 3 8
9 1154 45 111 37 15 13 1 36 1 0 0 0 2 3 14
10 1000 40 95 44 12 16 16 36 1 0 0 1 1 1 12
# ... with 925 more rows, and 2 more variables: feduc <int>, lwage <dbl>
Say I then look at a simple linear regression btw wage and IQ:
m_wage_iq = lm(wage ~ iq, data = wage)
m_wage_iq$coefficients
which gives me:
## (Intercept) iq
## 116.991565 8.303064
I want check that the errors are:
ϵi∼N(0,σ2)
How do I check this using R?
Upvotes: 0
Views: 2128
Reputation: 37879
There are a number of ways you can try.
One way would be the shapiro.test
to test for normality. A p.value
greater than your alpha level (typically up to 10%) would mean that the null hypothesis (i.e. the errors are normally distributed) cannot be rejected. However, the test is biased by sample size so you might want to reinforce your results by looking at the QQplot.
You can see that by plotting m_wage_iq
(plot(m_wage_iq )
) and looking at the second graph. If your points approximately lie on the x=y line then that would suggest that the errors follow a normal distribution.
Upvotes: 1