user16879601
user16879601

Reputation:

Linear Log Model in R weird Regression Line?

I have the following Dataset in R.

> eh
        Country PercentUrban GDPCapita UnemRate  FSI   HDI AvgHeight PEG AEG
1           USA           82      59.9      3.7 38.0 0.924     177.0   1   1
2        Canada           81      46.5      5.5 20.0 0.926     175.1   1   0
3     Australia           86      49.4      5.2 19.7 0.939     175.6   1   1
4   New Zealand           87      40.7      3.9 20.1 0.917     177.0   1   1
5            UK           83      44.9      3.9 36.7 0.922     175.3   0   0
6       Ireland           63      76.7      5.3 20.6 0.938     177.5   0   0
7       Iceland           94      55.3      4.4 19.8 0.935     181.0   1   1
8        Norway           82      62.2      3.8 18.0 0.953     179.7   1   1
9        Sweden           87      51.4      7.1 20.3 0.933     181.5   0   0
10      Finland           85      46.3      5.9 16.9 0.920     180.7   0   1
11      Denmark           88      54.3      3.8 19.5 0.929     180.4   1   1
12      Germany           77      52.6      3.1 24.7 0.936     178.1   0   0
13       France           80      44.0      8.5 32.0 0.901     175.6   0   0
14  Netherlands           91      54.4      3.5 24.8 0.931     180.8   0   0
15      Belgium           98      49.4      5.5 28.6 0.916     178.6   0   0
16   Luxembourg           91     107.6      5.3 20.4 0.904     179.9   1   1
17      Austria           58      53.9      6.7 25.0 0.908     179.0   1   0
18  Switzerland           74      66.3      2.1 18.7 0.944     175.4   1   1
19        Spain           80      39.0     14.1 40.7 0.891     174.2   0   0
20     Portugal           65      32.6      6.3 25.3 0.847     173.9   0   0
21      Ukraine           69       8.7      7.8 71.0 0.751     172.4   0   0
22       Russia           74      25.8      4.5 74.7 0.816     177.2   0   0
23        Italy           70      40.9      9.5 43.8 0.880     177.3   0   0
24     Slovenia           55      36.4      7.4 28.0 0.896     180.3   1   1
25     Slovakia           54      32.3      5.0 40.5 0.855     179.4   1   0
26      Czechia           74      38.0      2.7 37.6 0.888     180.2   0   1
27       Poland           60      29.9      5.2 42.8 0.865     178.7   0   0
28      Hungary           71      28.8      3.4 49.6 0.838     177.3   1   1
29      Romania           54      26.7      3.8 47.8 0.811     171.8   0   0
30     Bulgaria           75      20.9      5.3 50.6 0.813     175.2   1   1
31       Greece           79      28.6     16.9 53.9 0.870     176.9   0   0
32       Turkey           75      28.0     13.9 80.3 0.791     173.6   0   0
33  South Korea           81      38.8      3.4 33.7 0.903     173.5   1   1
34        Japan           92      42.1      2.2 34.3 0.909     172.1   1   1
35 South Africa           66      13.5     29.0 71.1 0.699     167.8   0   0
36      Nigeria           50       5.9     23.1 98.5 0.532     167.2   0   1
37       Brazil           87      15.6     11.8 71.8 0.759     172.5   1   1
38    Argentina           92      20.8     10.6 46.0 0.825     174.1   1   0
39    Indonesia           55      12.3      5.0 70.4 0.694     158.1   1   1
40        India           34       7.2      6.0 74.4 0.640     166.3   1   1
41        China           59      16.8      3.6 71.1 0.752     169.5   1   0
42        Egypt           43      11.6      7.5 88.4 0.696     170.3   0   0
43     Colombia           81      14.5     10.8 75.7 0.747     170.6   0   1

Im trying to create a linear log model with X being GDP and Y being FSI. So far I have done

> linlogmodel_eh<-lm(formula=eh$FSI~log(eh$GDPCapita))
> summary(linlogmodel_eh)

Call:
lm(formula = eh$FSI ~ log(eh$GDPCapita))

Residuals:
    Min      1Q  Median      3Q     Max 
-16.906  -6.335  -1.334   4.805  33.343 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        151.034      8.770   17.22  < 2e-16 ***
log(eh$GDPCapita)  -31.234      2.491  -12.54 1.29e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.54 on 41 degrees of freedom
Multiple R-squared:  0.7932,    Adjusted R-squared:  0.7882 
F-statistic: 157.3 on 1 and 41 DF,  p-value: 1.287e-15

> plot(eh$GDPCapita, eh$FSI, xlim=c(3, 152), ylim=c(15, 100))
> abline(151.034, -31.234)

Unfortunately when I do this and plot both the scatterplot and regression line, I get a oddly almost straight looking regression line. Is this the correct line for this? It seems very wrong visually.

enter image description here

Any advice on what I am doing wrong or what I need to fix? Im not entirely sure what is wrong here or what I would use to fix it.

Upvotes: 0

Views: 103

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226182

abline doesn't know you've transformed your x-variable. You have y = a + b*log(x) so you need

curve(151.034+(-31.234)*log(x), add = TRUE)

By the way, this is an unusual "log-linear" relationship. The more usual form (which gives rise to exponential curves) is log(y) = a + b*xy = exp(a)*exp(b*x)

Also, as a general best practice, I would recommend

lm(formula=log(FSI) ~ GDPCapita, data = eh)

(or FSI ~ log(GDPCapita) if you really want that version); using the data= argument makes your code easier to read and makes downstream methods like predict(), etc. more convenient.

Upvotes: 2

Related Questions