Finding p-value and z statistics along with the OLS Linear regression

I could find the coefficients and intercepts from linear regression but unable to find a suitable method to get p-value and z value for respective variable trend. Additionally, not able to find a method to save the output results in excel format. The data is here. There are 24 variables against time. I am not getting the z-statistics and p-values, additionally estimates are also incorrect by first method. where am i wrong?

library("trend")

# read ozone data (I converted to a text file first)
otm <- read.table("D:/data.txt",header=T)


#  make a data frame version
otm_df <- data.frame(otm)
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])

I tried this method. I didn't get the z-statistics and could not save it to excel format.

library(reshape2)
DF <- reshape2::melt(otm, id.var = "Year")
library(broom); library(tidyverse)
ols <- DF %>% nest(data = -variable) %>% 
  mutate(model = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, tidy)) %>% 
  unnest(tidied)

#to save the results in excel format (not working here for me)
capture.output(summary(ols), file = "ols.csv" )
write.csv(ols, file.path('E:/',filename = "ols2.csv"), row.names = TRUE)

# A tibble: 48 x 8
   variable data              model  term         estimate std.error statistic p.value
   <fct>    <list>            <list> <chr>           <dbl>     <dbl>     <dbl>   <dbl>
 1 BanTES   <tibble [11 x 2]> <lm>   (Intercept) -236.       488.       -0.483   0.641
 2 BanTES   <tibble [11 x 2]> <lm>   Year           0.139      0.242     0.572   0.582
 3 SriTES   <tibble [11 x 2]> <lm>   (Intercept)  220.       351.        0.627   0.546
 4 SriTES   <tibble [11 x 2]> <lm>   Year          -0.0935     0.174    -0.536   0.605
 5 AfgTES   <tibble [11 x 2]> <lm>   (Intercept)  364.       444.        0.820   0.434
 6 AfgTES   <tibble [11 x 2]> <lm>   Year          -0.161      0.221    -0.730   0.484
 7 BhuTES   <tibble [11 x 2]> <lm>   (Intercept)  373.       831.        0.449   0.664
 8 BhuTES   <tibble [11 x 2]> <lm>   Year          -0.170      0.413    -0.412   0.690
 9 IndTES   <tibble [11 x 2]> <lm>   (Intercept) -342.       213.       -1.60    0.143
10 IndTES   <tibble [11 x 2]> <lm>   Year           0.190      0.106     1.80    0.106

summary(ols)

    variable  data.Length  data.Class  data.Mode model.Length  model.Class  model.Mode     term          
 BanTES : 2   2       tbl_df  list               12    lm    list                      Length:48         
 SriTES : 2   2       tbl_df  list               12    lm    list                      Class :character  
 AfgTES : 2   2       tbl_df  list               12    lm    list                      Mode  :character  
 BhuTES : 2   2       tbl_df  list               12    lm    list                                        
 IndTES : 2   2       tbl_df  list               12    lm    list                                        
 NepTES : 2   2       tbl_df  list               12    lm    list                                        
 (Other):36   2       tbl_df  list               12    lm    list

Any help will be useful. Thank you in advance !

Upvotes: 4

Answers (3)

UseR10085

Reputation: 8198

Linear regression does not give you Z statistic as rightly commented by @Roland rather linear regression gives you t statistic. You can use the following code to save the coeff, t statistic, and p-value in excel format

library(tidyverse)
library(broom)
library(openxlsx)

# read ozone data (I converted to a text file first)
otm <- read.table("data.txt", header=T)
head(otm, 2)

otm %>% 
  pivot_longer(-c("Year")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(value ~ Year, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_1.xlsx")

In the "Linear_trend_1.xlsx" file, Year row for each location provides you the slope and the statistic is t statistic.

Your values from the first method are not matching with the second method because the first method uses markers as the dependent variable and all the variables in the otm_df were used as the independent variables. In the second method, Year was used as independent variable while all the location values were the dependent variable.
Another thing, you are using sample function to create markers which will give you different outputs for every run. So you have to use set.seed to make it constant for every run like

set.seed(123)
otm %>% 
  mutate(markers = sample(0:1, replace = T, size = 11)) %>% 
  pivot_longer(-c("Year", "markers")) %>%
  group_by(name) %>% 
  do(fitlm = tidy(lm(markers ~ value, data = ., na.action=na.omit))) %>% 
  unnest(fitlm) %>% 
  write.xlsx("Linear_trend_2.xlsx")

# A tibble: 48 x 6
   name   term        estimate std.error statistic p.value
   <chr>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 AfgERA (Intercept)  -8.58      7.79      -1.10    0.299
 2 AfgERA value         0.577     0.498      1.16    0.276
 3 AfgTES (Intercept)  -1.62      2.99      -0.542   0.601
 4 AfgTES value         0.0521    0.0750     0.695   0.505
 5 AfgTOM (Intercept) -25.3      19.5       -1.30    0.225
 6 AfgTOM value         1.94      1.46       1.33    0.218
 7 BanERA (Intercept)  -2.28      5.03      -0.453   0.662
 8 BanERA value         0.201     0.370      0.543   0.600
 9 BanTES (Intercept)  -3.42      2.80      -1.22    0.253
10 BanTES value         0.0892    0.0644     1.39    0.199
# ... with 38 more rows

If I modify your first approach like the following, you can see that the output from the above code and your code is basically same

#  make a data frame version
otm_df <- data.frame(otm)
set.seed(123) #To have same output from sample function for every run
markers <- sample(0:1, replace = T, size = 11)

# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])

Year.x     BanTES.x     SriTES.x     AfgTES.x     BhuTES.x     IndTES.x 
 0.054545455  0.089176876  0.092629314  0.052132553  0.001602003  0.107446434 
    NepTES.x     PakTES.x      SATES.x     BanERA.x     SriERA.x     AfgERA.x 
 0.065125607 -0.115108438  0.315928753  0.200712285 -0.519996739  0.577317012 
    BhuERA.x     IndERA.x     NepERA.x     PakERA.x      SAERA.x     BanTOM.x 
-0.140990921  0.110578204 -0.030546686  0.265909319  0.176797510  1.897234338 
    SriTOM.x     AfgTOM.x     BhuTOM.x     IndTOm.x     NepTOM.x     PakTOM.x 
 5.380610477  1.935170281  1.761172711  2.248531891  2.107380452  2.011356580 
     SATOM.x 
 2.214848959

Here is the data in dput format

dput(otm)
structure(list(Year = 2008:2018, BanTES = c(44.06247376, 43.81239107, 
40.68010622, 46.97760506, 37.49591135, 43.81239107, 43.81239107, 
43.81239107, 43.81239107, 45.27803189, 44.06247376), SriTES = c(32.07268265, 
35.01918477, 29.91018035, 34.1291577, 28.5258431, 32.07268265, 
32.07268265, 32.07268265, 32.07268265, 30.96753552, 32.07268265
), AfgTES = c(39.19328867, 42.06325898, 42.31015918, 40.54543762, 
34.28696385, 40.54543762, 40.54543762, 40.54543762, 40.54543762, 
37.38974643, 39.19328867), BhuTES = c(34.08241824, 36.95440954, 
30.41561338, 30.37004394, 19.8861367, 30.41561338, 30.41561338, 
30.41561338, 30.41561338, 32.09933763, 32.09933763), IndTES = c(40.05913352, 
41.54741392, 38.88957844, 42.47544504, 43.24350644, 41.54741392, 
41.54741392, 41.54741392, 41.54741392, 42.65820983, 42.47544504
), NepTES = c(38.12979871, 37.62785275, 34.40488247, 37.7995467, 
39.64286364, 37.7995467, 37.7995467, 37.7995467, 37.7995467, 
30.63632105, 37.7995467), PakTES = c(41.38388734, 41.99865359, 
42.16236093, 42.51838941, 43.4444952, 42.16236093, 42.16236093, 
42.16236093, 42.16236093, 44.96627251, 42.51838941), SATES = c(40.03077, 
41.52302, 39.6327, 41.9098, 41.11191, 41.11191, 41.11191, 41.11191, 
41.11191, 41.57009, 41.52302), BanERA = c(13.76686693, 13.904453, 
13.40584856, 13.45199721, 13.47657436, 12.8992102, 13.3586098, 
14.23223365, 13.4228729, 13.21487616, 14.50830571), SriERA = c(11.81852768, 
11.79187354, 11.51484349, 11.50552588, 11.489789, 11.23384852, 
10.61182708, 11.33951759, 11.6357584, 11.74685028, 12.14987906
), AfgERA = c(15.44115983, 15.425995, 15.6161623, 15.47751927, 
15.81748069, 15.47498417, 15.41748855, 16.06462541, 15.61143062, 
15.32810621, 16.39162424), BhuERA = c(14.34493453, 14.28085419, 
14.24728543, 14.03202106, 14.04152053, 13.42977221, 13.22665229, 
14.344052, 13.58792484, 13.28851619, 14.28029524), IndERA = c(14.08262362, 
14.11485037, 13.80713493, 13.86114379, 13.92607879, 13.37996473, 
13.45767152, 14.49365275, 13.88142768, 13.73986257, 14.77032404
), NepERA = c(14.93883379, 14.896056, 14.78607828, 14.50880606, 
14.69793299, 13.96309811, 14.18825383, 15.32530354, 14.38700954, 
13.98545482, 14.9828434), PakERA = c(14.89773191, 14.87337075, 
14.89837223, 14.76236826, 15.13918051, 14.61385609, 14.54589641, 
15.40150813, 14.8588883, 14.62185208, 15.6575491), SAERA = c(14.38877, 
14.40468, 14.22069, 14.20561, 14.35855, 13.8399, 13.88027, 14.83054, 
14.24554, 14.06201, 15.09615), BanTOM = c(9.317937851, 9.308046341, 
9.327401161, 9.319338799, 9.311285019, 9.300317764, 9.292790413, 
9.540946007, 9.04840374, 9.300317764, 9.317937851), SriTOM = c(5.437336445, 
5.436554909, 5.435492039, 5.440690994, 5.436693192, 5.440601349, 
5.427892685, 5.54946661, 5.427827358, 5.440601349, 5.437336445
), AfgTOM = c(13.31581736, 13.30339324, 13.30090284, 13.29781604, 
13.33800817, 13.31919873, 13.30073023, 13.62503445, 13.16488469, 
13.30073023, 13.31581736), BhuTOM = c(11.69911337, 11.67375898, 
11.71142626, 11.69099903, 11.68556881, 11.68714046, 11.65387106, 
11.97064924, 11.44872904, 11.67375898, 11.69099903), IndTOm = c(9.709311898, 
9.704142364, 9.703938368, 9.72520479, 9.709638531, 9.708799697, 
9.690851817, 9.952517961, 9.499369441, 9.704142364, 9.709638531
), NepTOM = c(12.45835066, 12.43677187, 12.48850822, 12.49002218, 
12.46283817, 12.50376368, 12.44072294, 12.78685617, 12.27891684, 
12.44072294, 12.46283817), PakTOM = c(12.37911913, 12.38028261, 
12.37067625, 12.38315158, 12.38352468, 12.36856567, 12.37349086, 
12.67422019, 12.18962786, 12.37349086, 12.38315158), SATOM = c(10.63543, 
10.62967, 10.62981, 10.64489, 10.63941, 10.63525, 10.6195, 10.89613, 
10.44028, 10.62967, 10.63941)), class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 3

Mata

Reputation: 665

You can use glance instead of tidy for the p-value:

ols <- DF %>% 
  nest(data = -variable) %>% 
  mutate(model  = map(data, ~lm(value ~ Year, data = .)), 
         tidied = map(model, glance)) %>% 
  unnest(tidied)

> ols
# A tibble: 24 x 15
   variable data              model  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
   <fct>    <list>            <list>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
 1 BanTES   <tibble [11 x 2]> <lm>     0.0350        -0.0722 2.54     0.327    0.582     1 -24.8   55.5  56.7    58.2            9    11
 2 SriTES   <tibble [11 x 2]> <lm>     0.0309        -0.0767 1.83     0.287    0.605     1 -21.2   48.3  49.5    30.1            9    11
 3 AfgTES   <tibble [11 x 2]> <lm>     0.0559        -0.0490 2.32     0.533    0.484     1 -23.7   53.5  54.7    48.2            9    11
 4 BhuTES   <tibble [11 x 2]> <lm>     0.0185        -0.0905 4.33     0.170    0.690     1 -30.6   67.3  68.4   169.             9    11
 5 IndTES   <tibble [11 x 2]> <lm>     0.264          0.183  1.11     3.23     0.106     1 -15.7   37.3  38.5    11.1            9    11
 6 NepTES   <tibble [11 x 2]> <lm>     0.0689        -0.0345 2.49     0.666    0.435     1 -24.5   55.0  56.2    55.6            9    11
 7 PakTES   <tibble [11 x 2]> <lm>     0.243          0.159  0.872    2.89     0.123     1 -13.0   32.0  33.2     6.84           9    11
 8 SATES    <tibble [11 x 2]> <lm>     0.221          0.135  0.625    2.56     0.144     1  -9.34  24.7  25.9     3.52           9    11
 9 BanERA   <tibble [11 x 2]> <lm>     0.0252        -0.0831 0.482    0.233    0.641     1  -6.49  19.0  20.2     2.09           9    11
10 SriERA   <tibble [11 x 2]> <lm>     0.00230       -0.109  0.416    0.0208   0.889     1  -4.87  15.7  16.9     1.56           9    11

Upvotes: 2

Guilherme Salome

Reputation: 2039

Regarding saving the data to Excel, if you have your data in a tibble or data.frame, then you can use the function:

readr::write_csv to save the data as a CSV file, which can be easily opened in Excel;
writexl::write_xlsx to save the data as a native Excel file.

Either function will save all rows and columns in your data to a file.

Upvotes: 2

Finding p-value and z statistics along with the OLS Linear regression

Answers (3)

Related Questions