Mohamed Rahouma
Mohamed Rahouma

Reputation: 1236

In summary.lm(P.for.trend) : essentially perfect fit: summary may be unreliable; How to deal with this?

I used the following data and code to assess trend of misconduct over years but I got weird results using linear regression model as you can see below. I saw a prior answers but I could not understand my problem yet. Should I use non-linear regression instead? If so which regression type would be recommended?

Any input will be greatly appreciated.

dataYear.Pub.MISCONDUCT<-read.table(text= "Year Yes
1965 100.00000   0.00000
1971 100.00000   0.00000
1973 100.00000   0.00000
1974   0.00000 100.00000
1975   0.00000 100.00000
1976   0.00000 100.00000
1977 100.00000   0.00000
1978 100.00000   0.00000
1979  66.66667  33.33333
1980  60.00000  40.00000
1981  70.00000  30.00000
1982  75.00000  25.00000
1983  54.54545  45.45455
1984  50.00000  50.00000
1985  20.00000  80.00000
1986  87.50000  12.50000
1987 100.00000   0.00000
1988  57.14286  42.85714
1989  60.00000  40.00000
1990  61.29032  38.70968
1991  65.00000  35.00000
1992  71.42857  28.57143
1993  43.75000  56.25000
1994  33.33333  66.66667
1995  43.75000  56.25000
1996  40.00000  60.00000
1997  41.46341  58.53659
1998  28.35821  71.64179
1999  17.24138  82.75862
2000  15.62500  84.37500
2001  38.37209  61.62791
2002  36.14458  63.85542
2003  37.14286  62.85714
2004  27.65957  72.34043
2005  32.93413  67.06587
2006  30.58252  69.41748
2007  28.20513  71.79487
2008  32.94574  67.05426
2009  31.06061  68.93939
2010  32.20339  67.79661
2011  33.11475  66.88525
2012  35.95166  64.04834
2013  31.17647  68.82353
2014  25.00000  75.00000
2015  32.27384  67.72616
2016  49.49833  50.50167
2017  55.37849  44.62151
2018  59.67742  40.32258
2019  65.17413  34.82587
2020  65.38462  34.61538 ", sep="", header=T);dataYear.Pub.MISCONDUCT

P.for.trend<-lm(dataYear.Pub.MISCONDUCT$Year~dataYear.Pub.MISCONDUCT$Yes);
summary (P.for.trend)

Results:

> Call:
lm(formula = dataYear.Pub.MISCONDUCT$Year ~ dataYear.Pub.MISCONDUCT$Yes)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.946e-14 -5.051e-15 -2.349e-15  1.044e-15  1.459e-13 

Coefficients:
                              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)                  1.000e+02  6.834e-15  1.463e+16   <2e-16 ***
dataYear.Pub.MISCONDUCT$Yes -1.000e+00  1.184e-16 -8.449e+15   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.241e-14 on 48 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 7.139e+31 on 1 and 48 DF,  p-value: < 2.2e-16

Warning message: In summary.lm(P.for.trend) : essentially perfect fit: summary may be unreliable

Upvotes: 2

Views: 10400

Answers (1)

Chuck P
Chuck P

Reputation: 3923

Lots of typos here but try assuming you want to predict the percent yes based on year.

P.for.trend <- lm(Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
summary(P.for.trend)
#> 
#> Call:
#> lm(formula = Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -63.029  -9.305  -5.332  16.556  45.607 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)   
#> (Intercept) 1374.4055   488.8403   2.812  0.00712 **
#> Year          -0.6643     0.2450  -2.712  0.00926 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 25.45 on 48 degrees of freedom
#> Multiple R-squared:  0.1328, Adjusted R-squared:  0.1148 
#> F-statistic: 7.353 on 1 and 48 DF,  p-value: 0.009261

Your data


dataYear.Pub.MISCONDUCT <- 
readr::read_table2("Year Yes No
1965 100.00000   0.00000
1971 100.00000   0.00000
1973 100.00000   0.00000
1974   0.00000 100.00000
1975   0.00000 100.00000
1976   0.00000 100.00000
1977 100.00000   0.00000
1978 100.00000   0.00000
1979  66.66667  33.33333
1980  60.00000  40.00000
1981  70.00000  30.00000
1982  75.00000  25.00000
1983  54.54545  45.45455
1984  50.00000  50.00000
1985  20.00000  80.00000
1986  87.50000  12.50000
1987 100.00000   0.00000
1988  57.14286  42.85714
1989  60.00000  40.00000
1990  61.29032  38.70968
1991  65.00000  35.00000
1992  71.42857  28.57143
1993  43.75000  56.25000
1994  33.33333  66.66667
1995  43.75000  56.25000
1996  40.00000  60.00000
1997  41.46341  58.53659
1998  28.35821  71.64179
1999  17.24138  82.75862
2000  15.62500  84.37500
2001  38.37209  61.62791
2002  36.14458  63.85542
2003  37.14286  62.85714
2004  27.65957  72.34043
2005  32.93413  67.06587
2006  30.58252  69.41748
2007  28.20513  71.79487
2008  32.94574  67.05426
2009  31.06061  68.93939
2010  32.20339  67.79661
2011  33.11475  66.88525
2012  35.95166  64.04834
2013  31.17647  68.82353
2014  25.00000  75.00000
2015  32.27384  67.72616
2016  49.49833  50.50167
2017  55.37849  44.62151
2018  59.67742  40.32258
2019  65.17413  34.82587
2020  65.38462  34.61538")

Upvotes: 3

Related Questions