Reputation: 87
I could find the coefficients and intercepts from linear regression but unable to find a suitable method to get p-value and z value for respective variable trend. Additionally, not able to find a method to save the output results in excel format. The data is here. There are 24 variables against time. I am not getting the z-statistics and p-values, additionally estimates are also incorrect by first method. where am i wrong?
library("trend")
# read ozone data (I converted to a text file first)
otm <- read.table("D:/data.txt",header=T)
# make a data frame version
otm_df <- data.frame(otm)
markers <- sample(0:1, replace = T, size = 11)
# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])
I tried this method. I didn't get the z-statistics and could not save it to excel format.
library(reshape2)
DF <- reshape2::melt(otm, id.var = "Year")
library(broom); library(tidyverse)
ols <- DF %>% nest(data = -variable) %>%
mutate(model = map(data, ~lm(value ~ Year, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
#to save the results in excel format (not working here for me)
capture.output(summary(ols), file = "ols.csv" )
write.csv(ols, file.path('E:/',filename = "ols2.csv"), row.names = TRUE)
# A tibble: 48 x 8
variable data model term estimate std.error statistic p.value
<fct> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 BanTES <tibble [11 x 2]> <lm> (Intercept) -236. 488. -0.483 0.641
2 BanTES <tibble [11 x 2]> <lm> Year 0.139 0.242 0.572 0.582
3 SriTES <tibble [11 x 2]> <lm> (Intercept) 220. 351. 0.627 0.546
4 SriTES <tibble [11 x 2]> <lm> Year -0.0935 0.174 -0.536 0.605
5 AfgTES <tibble [11 x 2]> <lm> (Intercept) 364. 444. 0.820 0.434
6 AfgTES <tibble [11 x 2]> <lm> Year -0.161 0.221 -0.730 0.484
7 BhuTES <tibble [11 x 2]> <lm> (Intercept) 373. 831. 0.449 0.664
8 BhuTES <tibble [11 x 2]> <lm> Year -0.170 0.413 -0.412 0.690
9 IndTES <tibble [11 x 2]> <lm> (Intercept) -342. 213. -1.60 0.143
10 IndTES <tibble [11 x 2]> <lm> Year 0.190 0.106 1.80 0.106
summary(ols)
variable data.Length data.Class data.Mode model.Length model.Class model.Mode term
BanTES : 2 2 tbl_df list 12 lm list Length:48
SriTES : 2 2 tbl_df list 12 lm list Class :character
AfgTES : 2 2 tbl_df list 12 lm list Mode :character
BhuTES : 2 2 tbl_df list 12 lm list
IndTES : 2 2 tbl_df list 12 lm list
NepTES : 2 2 tbl_df list 12 lm list
(Other):36 2 tbl_df list 12 lm list
Any help will be useful. Thank you in advance !
Upvotes: 4
Views: 680
Reputation: 8198
Linear regression does not give you Z statistic as rightly commented by @Roland rather linear regression gives you t statistic. You can use the following code to save the coeff, t statistic, and p-value in excel format
library(tidyverse)
library(broom)
library(openxlsx)
# read ozone data (I converted to a text file first)
otm <- read.table("data.txt", header=T)
head(otm, 2)
otm %>%
pivot_longer(-c("Year")) %>%
group_by(name) %>%
do(fitlm = tidy(lm(value ~ Year, data = ., na.action=na.omit))) %>%
unnest(fitlm) %>%
write.xlsx("Linear_trend_1.xlsx")
In the "Linear_trend_1.xlsx" file, Year row for each location provides you the slope and the statistic is t statistic.
Your values from the first method are not matching with the second method because the first method uses markers
as the dependent variable and all the variables in the otm_df
were used as the independent variables. In the second method, Year
was used as independent variable while all the location values were the dependent variable.
Another thing, you are using sample
function to create markers
which will give you different outputs for every run. So you have to use set.seed
to make it constant for every run like
set.seed(123)
otm %>%
mutate(markers = sample(0:1, replace = T, size = 11)) %>%
pivot_longer(-c("Year", "markers")) %>%
group_by(name) %>%
do(fitlm = tidy(lm(markers ~ value, data = ., na.action=na.omit))) %>%
unnest(fitlm) %>%
write.xlsx("Linear_trend_2.xlsx")
# A tibble: 48 x 6
name term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 AfgERA (Intercept) -8.58 7.79 -1.10 0.299
2 AfgERA value 0.577 0.498 1.16 0.276
3 AfgTES (Intercept) -1.62 2.99 -0.542 0.601
4 AfgTES value 0.0521 0.0750 0.695 0.505
5 AfgTOM (Intercept) -25.3 19.5 -1.30 0.225
6 AfgTOM value 1.94 1.46 1.33 0.218
7 BanERA (Intercept) -2.28 5.03 -0.453 0.662
8 BanERA value 0.201 0.370 0.543 0.600
9 BanTES (Intercept) -3.42 2.80 -1.22 0.253
10 BanTES value 0.0892 0.0644 1.39 0.199
# ... with 38 more rows
If I modify your first approach like the following, you can see that the output from the above code and your code is basically same
# make a data frame version
otm_df <- data.frame(otm)
set.seed(123) #To have same output from sample function for every run
markers <- sample(0:1, replace = T, size = 11)
# calculate OLS slope for all columns
# the -1 at end removes the intercepts
ols <- sapply(otm_df, function(x) coef(lm(markers ~ x))[-1])
Year.x BanTES.x SriTES.x AfgTES.x BhuTES.x IndTES.x
0.054545455 0.089176876 0.092629314 0.052132553 0.001602003 0.107446434
NepTES.x PakTES.x SATES.x BanERA.x SriERA.x AfgERA.x
0.065125607 -0.115108438 0.315928753 0.200712285 -0.519996739 0.577317012
BhuERA.x IndERA.x NepERA.x PakERA.x SAERA.x BanTOM.x
-0.140990921 0.110578204 -0.030546686 0.265909319 0.176797510 1.897234338
SriTOM.x AfgTOM.x BhuTOM.x IndTOm.x NepTOM.x PakTOM.x
5.380610477 1.935170281 1.761172711 2.248531891 2.107380452 2.011356580
SATOM.x
2.214848959
Here is the data in dput
format
dput(otm)
structure(list(Year = 2008:2018, BanTES = c(44.06247376, 43.81239107,
40.68010622, 46.97760506, 37.49591135, 43.81239107, 43.81239107,
43.81239107, 43.81239107, 45.27803189, 44.06247376), SriTES = c(32.07268265,
35.01918477, 29.91018035, 34.1291577, 28.5258431, 32.07268265,
32.07268265, 32.07268265, 32.07268265, 30.96753552, 32.07268265
), AfgTES = c(39.19328867, 42.06325898, 42.31015918, 40.54543762,
34.28696385, 40.54543762, 40.54543762, 40.54543762, 40.54543762,
37.38974643, 39.19328867), BhuTES = c(34.08241824, 36.95440954,
30.41561338, 30.37004394, 19.8861367, 30.41561338, 30.41561338,
30.41561338, 30.41561338, 32.09933763, 32.09933763), IndTES = c(40.05913352,
41.54741392, 38.88957844, 42.47544504, 43.24350644, 41.54741392,
41.54741392, 41.54741392, 41.54741392, 42.65820983, 42.47544504
), NepTES = c(38.12979871, 37.62785275, 34.40488247, 37.7995467,
39.64286364, 37.7995467, 37.7995467, 37.7995467, 37.7995467,
30.63632105, 37.7995467), PakTES = c(41.38388734, 41.99865359,
42.16236093, 42.51838941, 43.4444952, 42.16236093, 42.16236093,
42.16236093, 42.16236093, 44.96627251, 42.51838941), SATES = c(40.03077,
41.52302, 39.6327, 41.9098, 41.11191, 41.11191, 41.11191, 41.11191,
41.11191, 41.57009, 41.52302), BanERA = c(13.76686693, 13.904453,
13.40584856, 13.45199721, 13.47657436, 12.8992102, 13.3586098,
14.23223365, 13.4228729, 13.21487616, 14.50830571), SriERA = c(11.81852768,
11.79187354, 11.51484349, 11.50552588, 11.489789, 11.23384852,
10.61182708, 11.33951759, 11.6357584, 11.74685028, 12.14987906
), AfgERA = c(15.44115983, 15.425995, 15.6161623, 15.47751927,
15.81748069, 15.47498417, 15.41748855, 16.06462541, 15.61143062,
15.32810621, 16.39162424), BhuERA = c(14.34493453, 14.28085419,
14.24728543, 14.03202106, 14.04152053, 13.42977221, 13.22665229,
14.344052, 13.58792484, 13.28851619, 14.28029524), IndERA = c(14.08262362,
14.11485037, 13.80713493, 13.86114379, 13.92607879, 13.37996473,
13.45767152, 14.49365275, 13.88142768, 13.73986257, 14.77032404
), NepERA = c(14.93883379, 14.896056, 14.78607828, 14.50880606,
14.69793299, 13.96309811, 14.18825383, 15.32530354, 14.38700954,
13.98545482, 14.9828434), PakERA = c(14.89773191, 14.87337075,
14.89837223, 14.76236826, 15.13918051, 14.61385609, 14.54589641,
15.40150813, 14.8588883, 14.62185208, 15.6575491), SAERA = c(14.38877,
14.40468, 14.22069, 14.20561, 14.35855, 13.8399, 13.88027, 14.83054,
14.24554, 14.06201, 15.09615), BanTOM = c(9.317937851, 9.308046341,
9.327401161, 9.319338799, 9.311285019, 9.300317764, 9.292790413,
9.540946007, 9.04840374, 9.300317764, 9.317937851), SriTOM = c(5.437336445,
5.436554909, 5.435492039, 5.440690994, 5.436693192, 5.440601349,
5.427892685, 5.54946661, 5.427827358, 5.440601349, 5.437336445
), AfgTOM = c(13.31581736, 13.30339324, 13.30090284, 13.29781604,
13.33800817, 13.31919873, 13.30073023, 13.62503445, 13.16488469,
13.30073023, 13.31581736), BhuTOM = c(11.69911337, 11.67375898,
11.71142626, 11.69099903, 11.68556881, 11.68714046, 11.65387106,
11.97064924, 11.44872904, 11.67375898, 11.69099903), IndTOm = c(9.709311898,
9.704142364, 9.703938368, 9.72520479, 9.709638531, 9.708799697,
9.690851817, 9.952517961, 9.499369441, 9.704142364, 9.709638531
), NepTOM = c(12.45835066, 12.43677187, 12.48850822, 12.49002218,
12.46283817, 12.50376368, 12.44072294, 12.78685617, 12.27891684,
12.44072294, 12.46283817), PakTOM = c(12.37911913, 12.38028261,
12.37067625, 12.38315158, 12.38352468, 12.36856567, 12.37349086,
12.67422019, 12.18962786, 12.37349086, 12.38315158), SATOM = c(10.63543,
10.62967, 10.62981, 10.64489, 10.63941, 10.63525, 10.6195, 10.89613,
10.44028, 10.62967, 10.63941)), class = "data.frame", row.names = c(NA,
-11L))
Upvotes: 3
Reputation: 665
You can use glance
instead of tidy
for the p-value:
ols <- DF %>%
nest(data = -variable) %>%
mutate(model = map(data, ~lm(value ~ Year, data = .)),
tidied = map(model, glance)) %>%
unnest(tidied)
> ols
# A tibble: 24 x 15
variable data model r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 BanTES <tibble [11 x 2]> <lm> 0.0350 -0.0722 2.54 0.327 0.582 1 -24.8 55.5 56.7 58.2 9 11
2 SriTES <tibble [11 x 2]> <lm> 0.0309 -0.0767 1.83 0.287 0.605 1 -21.2 48.3 49.5 30.1 9 11
3 AfgTES <tibble [11 x 2]> <lm> 0.0559 -0.0490 2.32 0.533 0.484 1 -23.7 53.5 54.7 48.2 9 11
4 BhuTES <tibble [11 x 2]> <lm> 0.0185 -0.0905 4.33 0.170 0.690 1 -30.6 67.3 68.4 169. 9 11
5 IndTES <tibble [11 x 2]> <lm> 0.264 0.183 1.11 3.23 0.106 1 -15.7 37.3 38.5 11.1 9 11
6 NepTES <tibble [11 x 2]> <lm> 0.0689 -0.0345 2.49 0.666 0.435 1 -24.5 55.0 56.2 55.6 9 11
7 PakTES <tibble [11 x 2]> <lm> 0.243 0.159 0.872 2.89 0.123 1 -13.0 32.0 33.2 6.84 9 11
8 SATES <tibble [11 x 2]> <lm> 0.221 0.135 0.625 2.56 0.144 1 -9.34 24.7 25.9 3.52 9 11
9 BanERA <tibble [11 x 2]> <lm> 0.0252 -0.0831 0.482 0.233 0.641 1 -6.49 19.0 20.2 2.09 9 11
10 SriERA <tibble [11 x 2]> <lm> 0.00230 -0.109 0.416 0.0208 0.889 1 -4.87 15.7 16.9 1.56 9 11
Upvotes: 2
Reputation: 2039
Regarding saving the data to Excel, if you have your data in a tibble or data.frame, then you can use the function:
readr::write_csv
to save the data as a CSV file, which can be easily opened in Excel;writexl::write_xlsx
to save the data as a native Excel file.Either function will save all rows and columns in your data to a file.
Upvotes: 2