Reputation: 1236
I used the following data and code to assess trend of misconduct over years but I got weird results using linear regression model as you can see below. I saw a prior answers but I could not understand my problem yet. Should I use non-linear regression instead? If so which regression type would be recommended?
Any input will be greatly appreciated.
dataYear.Pub.MISCONDUCT<-read.table(text= "Year Yes
1965 100.00000 0.00000
1971 100.00000 0.00000
1973 100.00000 0.00000
1974 0.00000 100.00000
1975 0.00000 100.00000
1976 0.00000 100.00000
1977 100.00000 0.00000
1978 100.00000 0.00000
1979 66.66667 33.33333
1980 60.00000 40.00000
1981 70.00000 30.00000
1982 75.00000 25.00000
1983 54.54545 45.45455
1984 50.00000 50.00000
1985 20.00000 80.00000
1986 87.50000 12.50000
1987 100.00000 0.00000
1988 57.14286 42.85714
1989 60.00000 40.00000
1990 61.29032 38.70968
1991 65.00000 35.00000
1992 71.42857 28.57143
1993 43.75000 56.25000
1994 33.33333 66.66667
1995 43.75000 56.25000
1996 40.00000 60.00000
1997 41.46341 58.53659
1998 28.35821 71.64179
1999 17.24138 82.75862
2000 15.62500 84.37500
2001 38.37209 61.62791
2002 36.14458 63.85542
2003 37.14286 62.85714
2004 27.65957 72.34043
2005 32.93413 67.06587
2006 30.58252 69.41748
2007 28.20513 71.79487
2008 32.94574 67.05426
2009 31.06061 68.93939
2010 32.20339 67.79661
2011 33.11475 66.88525
2012 35.95166 64.04834
2013 31.17647 68.82353
2014 25.00000 75.00000
2015 32.27384 67.72616
2016 49.49833 50.50167
2017 55.37849 44.62151
2018 59.67742 40.32258
2019 65.17413 34.82587
2020 65.38462 34.61538 ", sep="", header=T);dataYear.Pub.MISCONDUCT
P.for.trend<-lm(dataYear.Pub.MISCONDUCT$Year~dataYear.Pub.MISCONDUCT$Yes);
summary (P.for.trend)
Results:
> Call:
lm(formula = dataYear.Pub.MISCONDUCT$Year ~ dataYear.Pub.MISCONDUCT$Yes)
Residuals:
Min 1Q Median 3Q Max
-1.946e-14 -5.051e-15 -2.349e-15 1.044e-15 1.459e-13
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.000e+02 6.834e-15 1.463e+16 <2e-16 ***
dataYear.Pub.MISCONDUCT$Yes -1.000e+00 1.184e-16 -8.449e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.241e-14 on 48 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 7.139e+31 on 1 and 48 DF, p-value: < 2.2e-16
Warning message: In summary.lm(P.for.trend) : essentially perfect fit: summary may be unreliable
Upvotes: 2
Views: 10400
Reputation: 3923
Lots of typos here but try assuming you want to predict the percent yes based on year.
P.for.trend <- lm(Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
summary(P.for.trend)
#>
#> Call:
#> lm(formula = Yes ~ Year, data = dataYear.Pub.MISCONDUCT)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -63.029 -9.305 -5.332 16.556 45.607
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1374.4055 488.8403 2.812 0.00712 **
#> Year -0.6643 0.2450 -2.712 0.00926 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 25.45 on 48 degrees of freedom
#> Multiple R-squared: 0.1328, Adjusted R-squared: 0.1148
#> F-statistic: 7.353 on 1 and 48 DF, p-value: 0.009261
Your data
dataYear.Pub.MISCONDUCT <-
readr::read_table2("Year Yes No
1965 100.00000 0.00000
1971 100.00000 0.00000
1973 100.00000 0.00000
1974 0.00000 100.00000
1975 0.00000 100.00000
1976 0.00000 100.00000
1977 100.00000 0.00000
1978 100.00000 0.00000
1979 66.66667 33.33333
1980 60.00000 40.00000
1981 70.00000 30.00000
1982 75.00000 25.00000
1983 54.54545 45.45455
1984 50.00000 50.00000
1985 20.00000 80.00000
1986 87.50000 12.50000
1987 100.00000 0.00000
1988 57.14286 42.85714
1989 60.00000 40.00000
1990 61.29032 38.70968
1991 65.00000 35.00000
1992 71.42857 28.57143
1993 43.75000 56.25000
1994 33.33333 66.66667
1995 43.75000 56.25000
1996 40.00000 60.00000
1997 41.46341 58.53659
1998 28.35821 71.64179
1999 17.24138 82.75862
2000 15.62500 84.37500
2001 38.37209 61.62791
2002 36.14458 63.85542
2003 37.14286 62.85714
2004 27.65957 72.34043
2005 32.93413 67.06587
2006 30.58252 69.41748
2007 28.20513 71.79487
2008 32.94574 67.05426
2009 31.06061 68.93939
2010 32.20339 67.79661
2011 33.11475 66.88525
2012 35.95166 64.04834
2013 31.17647 68.82353
2014 25.00000 75.00000
2015 32.27384 67.72616
2016 49.49833 50.50167
2017 55.37849 44.62151
2018 59.67742 40.32258
2019 65.17413 34.82587
2020 65.38462 34.61538")
Upvotes: 3