Reputation: 1

Linear model across several columns that are years in R

I have a data frame (over 1000 samples) that has a percentage of vegetation cover sampled at several years. I want know if the percent cover it generally increasing, decreasing or no trend. I have been trying to fit a linear model in R using tidyverse to the columns as the independent variable and a vector of the year as the x. My thought is if there is a positive slope that is significant I can say that vegetation cover is increasing at that site. Doing this one at a time is fine but I have thousands.

Here is what the dataframe looks like:

enter image description here

Northing    Easting veg03   veg07   veg09   veg14   veg19   estimate    p-value Sig

4215730 293206  5.12    4.82    3.59    9.87    5.89    0.1732  0.446   
4525120 556915  20.88   26.58   31.80   32.69   35.69   0.8712  0.02265 *
4459090 402479  2.03    2.60    3.58    9.93    2.01            
4557278 452391  17.86   17.91   17.58   29.52   50.05           
4515402 489398  39.83   47.31   46.65   36.91   45.27           
4630687 352334  6.30    13.07   17.67   17.32   11.81           
4182340 248699  41.75   42.73   44.45   55.27   63.06           
4390889 418719  49.08   40.27   45.97   49.50   26.63

I did the first two rows by hand to give the estimate and p-value which is how I want the entire table to look.

Here is the by hand code:

dep <- c(20.88,26.58,31.80,32.69,35.69)
dep2 <- c(5.12, 4.82, 3.59, 9.87, 5.89)
t <- c(3,7,9,14,19)

summary(lm(dep ~ t))
summary(lm(dep2 ~ t))

What I have tried is to nest it and run it with mutate in tidyverse but clearly I do not get it well enough:

fy %>% 
  as_tibble %>%
  nest(data = c(veg03,veg07,veg09,veg14,veg19)) %>%
  mutate(lm_model = (lm(t ~ data))) %>%
  unnest(lm_model, keep_empty = TRUE)

Any help is greatly appreciated. Thanks

Upvotes: 0

Answers (2)

Brian Fisher

Reputation: 1367

Generally the tidyverse functions are looking for a "longer" data format, where each observation of the same variable is in a single column. To stay with the "tidy" approach, you would want to convert your columns to a variable and then fitting models to grouped or nested data.

library(tidyverse)

### reading in your sample base data
cnames <-  c("Northing" ,"Easting" ,"veg03" ,"veg07" ,"veg09" ,"veg14" ,"veg19" )

dt <-  c(
4215730 ,293206 ,5.12 ,4.82 ,3.59 ,9.87 ,5.89 ,
4525120 ,556915 ,20.88 ,26.58 ,31.80 ,32.69 ,35.69 ,
4459090 ,402479 ,2.03 ,2.60 ,3.58 ,9.93 ,2.01 ,
4557278 ,452391 ,17.86 ,17.91 ,17.58 ,29.52 ,50.05,
4515402 ,489398 ,39.83 ,47.31 ,46.65 ,36.91 ,45.27,
4630687 ,352334 ,6.30 ,13.07 ,17.67 ,17.32 ,11.81 ,
4182340 ,248699 ,41.75 ,42.73 ,44.45 ,55.27 ,63.06,
4390889 ,418719 ,49.08 ,40.27 ,45.97 ,49.50 ,26.63 
)
dim(dt) <- c(7,8)
dt = t(dt)
colnames(dt) <- cnames
dt = as.tibble(dt)
dt

# # A tibble: 8 x 7
# Northing Easting veg03 veg07 veg09 veg14 veg19
# <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  4215730  293206  5.12  4.82  3.59  9.87  5.89
# 2  4525120  556915 20.9  26.6  31.8  32.7  35.7 
# 3  4459090  402479  2.03  2.6   3.58  9.93  2.01
# 4  4557278  452391 17.9  17.9  17.6  29.5  50.0 
# 5  4515402  489398 39.8  47.3  46.6  36.9  45.3 
# 6  4630687  352334  6.3  13.1  17.7  17.3  11.8 
# 7  4182340  248699 41.8  42.7  44.4  55.3  63.1 
# 8  4390889  418719 49.1  40.3  46.0  49.5  26.6 

## Adding ID column and converting columns to rows
mydata = dt %>%
      mutate(ID = row_number()) %>%
      pivot_longer(starts_with("veg"),
                   names_to = "year",
                   names_prefix = "veg") %>%
      mutate(year = as.integer(year))
mydata      

# # A tibble: 40 x 5
# Northing Easting    ID  year value
# <dbl>   <dbl> <int> <int> <dbl>
# 1  4215730  293206     1     3  5.12
# 2  4215730  293206     1     7  4.82
# 3  4215730  293206     1     9  3.59
# 4  4215730  293206     1    14  9.87
# 5  4215730  293206     1    19  5.89
# 6  4525120  556915     2     3 20.9 
# 7  4525120  556915     2     7 26.6 
# 8  4525120  556915     2     9 31.8 
# 9  4525120  556915     2    14 32.7 
# 10  4525120  556915     2    19 35.7 
# # ... with 30 more rows

With this format, the year is recognized as a number, and each original row has an ID.

To fit the models, we'll use group_by, fit the model and extract the slope and p.values. This will return a data frame (tibble) with a row for each row of our original data, and includes the fitted model and the extracted values.

out.data <- mydata %>%
      group_by(ID) %>%
      do(model = lm(value ~ year, data = .)) %>%
      mutate(slope = model$coefficients[2],
             p.value = summary(model)$coefficients[2,4]
             )
out.data
# # A tibble: 8 x 4
# # Rowwise: 
# ID model     slope p.value
# <int> <list>    <dbl>   <dbl>
# 1     1 <lm>    0.173   0.446  
# 2     2 <lm>    0.871   0.0226 
# 3     3 <lm>    0.156   0.638  
# 4     4 <lm>    2.06    0.0319 
# 5     5 <lm>    0.00832 0.986  
# 6     6 <lm>    0.310   0.487  
# 7     7 <lm>    1.45    0.00728
# 8     8 <lm>   -1.01    0.221

You could add this back to the original data with a join.

You can accomplish the same thing using nested data tables after pivoting to "long data" format

using_nested <- mydata %>%
      nest(-ID) %>%
      mutate(fit = map(data, ~ lm(value ~ year, data = .))) %>%
      rowwise() %>%
      mutate(
             slope = fit$coefficients[2],
             p.value = summary(fit)$coefficients[2,4]
             )
using_nested
# # A tibble: 8 x 5
# # Rowwise: 
# ID data             fit       slope p.value
# <int> <list>           <list>    <dbl>   <dbl>
# 1     1 <tibble [5 x 4]> <lm>    0.173   0.446  
# 2     2 <tibble [5 x 4]> <lm>    0.871   0.0226 
# 3     3 <tibble [5 x 4]> <lm>    0.156   0.638  
# 4     4 <tibble [5 x 4]> <lm>    2.06    0.0319 
# 5     5 <tibble [5 x 4]> <lm>    0.00832 0.986  
# 6     6 <tibble [5 x 4]> <lm>    0.310   0.487  
# 7     7 <tibble [5 x 4]> <lm>    1.45    0.00728
# 8     8 <tibble [5 x 4]> <lm>   -1.01    0.221

Upvotes: 1

GuedesBF

Reputation: 9878

You probably need to loop apply some function. combining lapply() with a custom function that creates a model for every line may do the trick:

lapply(1:nrow(fy), function(x){summary(lm(x[3:7]~c(3,7,9,14,19)))})

then you can extract the estimates and p values from the list

Upvotes: 0

Linear model across several columns that are years in R

Answers (2)

Related Questions