Reputation: 1
I have a data frame (over 1000 samples) that has a percentage of vegetation cover sampled at several years. I want know if the percent cover it generally increasing, decreasing or no trend. I have been trying to fit a linear model in R using tidyverse to the columns as the independent variable and a vector of the year as the x. My thought is if there is a positive slope that is significant I can say that vegetation cover is increasing at that site. Doing this one at a time is fine but I have thousands.
Here is what the dataframe looks like:
Northing Easting veg03 veg07 veg09 veg14 veg19 estimate p-value Sig
4215730 293206 5.12 4.82 3.59 9.87 5.89 0.1732 0.446
4525120 556915 20.88 26.58 31.80 32.69 35.69 0.8712 0.02265 *
4459090 402479 2.03 2.60 3.58 9.93 2.01
4557278 452391 17.86 17.91 17.58 29.52 50.05
4515402 489398 39.83 47.31 46.65 36.91 45.27
4630687 352334 6.30 13.07 17.67 17.32 11.81
4182340 248699 41.75 42.73 44.45 55.27 63.06
4390889 418719 49.08 40.27 45.97 49.50 26.63
I did the first two rows by hand to give the estimate and p-value which is how I want the entire table to look.
Here is the by hand code:
dep <- c(20.88,26.58,31.80,32.69,35.69)
dep2 <- c(5.12, 4.82, 3.59, 9.87, 5.89)
t <- c(3,7,9,14,19)
summary(lm(dep ~ t))
summary(lm(dep2 ~ t))
What I have tried is to nest it and run it with mutate in tidyverse but clearly I do not get it well enough:
fy %>%
as_tibble %>%
nest(data = c(veg03,veg07,veg09,veg14,veg19)) %>%
mutate(lm_model = (lm(t ~ data))) %>%
unnest(lm_model, keep_empty = TRUE)
Any help is greatly appreciated. Thanks
Upvotes: 0
Views: 734
Reputation: 1367
Generally the tidyverse functions are looking for a "longer" data format, where each observation of the same variable is in a single column. To stay with the "tidy" approach, you would want to convert your columns to a variable and then fitting models to grouped or nested data.
library(tidyverse)
### reading in your sample base data
cnames <- c("Northing" ,"Easting" ,"veg03" ,"veg07" ,"veg09" ,"veg14" ,"veg19" )
dt <- c(
4215730 ,293206 ,5.12 ,4.82 ,3.59 ,9.87 ,5.89 ,
4525120 ,556915 ,20.88 ,26.58 ,31.80 ,32.69 ,35.69 ,
4459090 ,402479 ,2.03 ,2.60 ,3.58 ,9.93 ,2.01 ,
4557278 ,452391 ,17.86 ,17.91 ,17.58 ,29.52 ,50.05,
4515402 ,489398 ,39.83 ,47.31 ,46.65 ,36.91 ,45.27,
4630687 ,352334 ,6.30 ,13.07 ,17.67 ,17.32 ,11.81 ,
4182340 ,248699 ,41.75 ,42.73 ,44.45 ,55.27 ,63.06,
4390889 ,418719 ,49.08 ,40.27 ,45.97 ,49.50 ,26.63
)
dim(dt) <- c(7,8)
dt = t(dt)
colnames(dt) <- cnames
dt = as.tibble(dt)
dt
# # A tibble: 8 x 7
# Northing Easting veg03 veg07 veg09 veg14 veg19
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 4215730 293206 5.12 4.82 3.59 9.87 5.89
# 2 4525120 556915 20.9 26.6 31.8 32.7 35.7
# 3 4459090 402479 2.03 2.6 3.58 9.93 2.01
# 4 4557278 452391 17.9 17.9 17.6 29.5 50.0
# 5 4515402 489398 39.8 47.3 46.6 36.9 45.3
# 6 4630687 352334 6.3 13.1 17.7 17.3 11.8
# 7 4182340 248699 41.8 42.7 44.4 55.3 63.1
# 8 4390889 418719 49.1 40.3 46.0 49.5 26.6
## Adding ID column and converting columns to rows
mydata = dt %>%
mutate(ID = row_number()) %>%
pivot_longer(starts_with("veg"),
names_to = "year",
names_prefix = "veg") %>%
mutate(year = as.integer(year))
mydata
# # A tibble: 40 x 5
# Northing Easting ID year value
# <dbl> <dbl> <int> <int> <dbl>
# 1 4215730 293206 1 3 5.12
# 2 4215730 293206 1 7 4.82
# 3 4215730 293206 1 9 3.59
# 4 4215730 293206 1 14 9.87
# 5 4215730 293206 1 19 5.89
# 6 4525120 556915 2 3 20.9
# 7 4525120 556915 2 7 26.6
# 8 4525120 556915 2 9 31.8
# 9 4525120 556915 2 14 32.7
# 10 4525120 556915 2 19 35.7
# # ... with 30 more rows
With this format, the year is recognized as a number, and each original row has an ID.
To fit the models, we'll use group_by
, fit the model and extract the slope and p.values. This will return a data frame (tibble) with a row for each row of our original data, and includes the fitted model and the extracted values.
out.data <- mydata %>%
group_by(ID) %>%
do(model = lm(value ~ year, data = .)) %>%
mutate(slope = model$coefficients[2],
p.value = summary(model)$coefficients[2,4]
)
out.data
# # A tibble: 8 x 4
# # Rowwise:
# ID model slope p.value
# <int> <list> <dbl> <dbl>
# 1 1 <lm> 0.173 0.446
# 2 2 <lm> 0.871 0.0226
# 3 3 <lm> 0.156 0.638
# 4 4 <lm> 2.06 0.0319
# 5 5 <lm> 0.00832 0.986
# 6 6 <lm> 0.310 0.487
# 7 7 <lm> 1.45 0.00728
# 8 8 <lm> -1.01 0.221
You could add this back to the original data with a join.
You can accomplish the same thing using nested data tables after pivoting to "long data" format
using_nested <- mydata %>%
nest(-ID) %>%
mutate(fit = map(data, ~ lm(value ~ year, data = .))) %>%
rowwise() %>%
mutate(
slope = fit$coefficients[2],
p.value = summary(fit)$coefficients[2,4]
)
using_nested
# # A tibble: 8 x 5
# # Rowwise:
# ID data fit slope p.value
# <int> <list> <list> <dbl> <dbl>
# 1 1 <tibble [5 x 4]> <lm> 0.173 0.446
# 2 2 <tibble [5 x 4]> <lm> 0.871 0.0226
# 3 3 <tibble [5 x 4]> <lm> 0.156 0.638
# 4 4 <tibble [5 x 4]> <lm> 2.06 0.0319
# 5 5 <tibble [5 x 4]> <lm> 0.00832 0.986
# 6 6 <tibble [5 x 4]> <lm> 0.310 0.487
# 7 7 <tibble [5 x 4]> <lm> 1.45 0.00728
# 8 8 <tibble [5 x 4]> <lm> -1.01 0.221
Upvotes: 1
Reputation: 9878
You probably need to loop apply some function. combining lapply() with a custom function that creates a model for every line may do the trick:
lapply(1:nrow(fy), function(x){summary(lm(x[3:7]~c(3,7,9,14,19)))})
then you can extract the estimates and p values from the list
Upvotes: 0