Using R to determine if errors are normally distributed:

Question

Say I have a dataset called wage that looks like this:

 wage
# A tibble: 935 x 17
    wage hours    iq   kww  educ exper tenure   age married  black  south  urban  sibs brthord meduc
                     
 1   769    40    93    35    12    11      2    31       1      0      0      1     1       2     8
 2   808    50   119    41    18    11     16    37       1      0      0      1     1      NA    14
 3   825    40   108    46    14    11      9    33       1      0      0      1     1       2    14
 4   650    40    96    32    12    13      7    32       1      0      0      1     4       3    12
 5   562    40    74    27    11    14      5    34       1      0      0      1    10       6     6
 6  1400    40   116    43    16    14      2    35       1      1      0      1     1       2     8
 7   600    40    91    24    10    13      0    30       0      0      0      1     1       2     8
 8  1081    40   114    50    18     8     14    38       1      0      0      1     2       3     8
 9  1154    45   111    37    15    13      1    36       1      0      0      0     2       3    14
10  1000    40    95    44    12    16     16    36       1      0      0      1     1       1    12
# ... with 925 more rows, and 2 more variables: feduc , lwage

Say I then look at a simple linear regression btw wage and IQ:

m_wage_iq = lm(wage ~ iq, data = wage)
m_wage_iq$coefficients

which gives me:

## (Intercept)          iq 
##  116.991565    8.303064

I want check that the errors are:

ϵi∼N(0,σ2)

How do I check this using R?

LyzandeR · Accepted Answer

There are a number of ways you can try.

One way would be the shapiro.test to test for normality. A p.value greater than your alpha level (typically up to 10%) would mean that the null hypothesis (i.e. the errors are normally distributed) cannot be rejected. However, the test is biased by sample size so you might want to reinforce your results by looking at the QQplot.

You can see that by plotting m_wage_iq (plot(m_wage_iq )) and looking at the second graph. If your points approximately lie on the x=y line then that would suggest that the errors follow a normal distribution.

Using R to determine if errors are normally distributed:

Answers (1)

Related Questions