Oriol Baena Crespo
Oriol Baena Crespo

Reputation: 347

R : Linear regressions (more than 1) on a single dataframe. Results as table

My dataset is similar to:

data <- tibble( "DATE_FIRE"= c("1989-07-31", "1989-07-31", "1989-07-31", "1989-07-31","1989-07-31","1989-08-31", "1989-08-31", "1989-08-31", "1989-08-31","1989-08-31"), 
       "FID" = c(1,1,1,1,1,2,2,2,2,2),
       "Date" = c(1988, 1989, 1990, 1991, 1992, 1988, 1989, 1990, 1991, 1992),
       "NDVI" = c( 0.9, 0.8, 0.1, 0.2, 0.3, 0.8, 0.85, 0.15, 0.30, 0.50))

data$DATE_FIRE <- as.Date(data$DATE_FIRE, format= "%Y-%m-%d")
data$FID <- as.factor(data$FID)

    > data
# A tibble: 10 x 4
   DATE_FIRE  FID    Date  NDVI
   <date>     <fct> <dbl> <dbl>
 1 1989-07-31 1      1988  0.9 
 2 1989-07-31 1      1989  0.8 
 3 1989-07-31 1      1990  0.1 
 4 1989-07-31 1      1991  0.2 
 5 1989-07-31 1      1992  0.3 
 6 1989-08-31 2      1988  0.8 
 7 1989-08-31 2      1989  0.85
 8 1989-08-31 2      1990  0.15
 9 1989-08-31 2      1991  0.3 
10 1989-08-31 2      1992  0.5 

It is about forest fire and its recovery by NDVI values. As forest recover, NDVI value rises.

  1. DATE_FIRE: Year when the fire took place for each plot
  2. FID: ID of each plot
  3. Date: date of the measurement of NDVI
  4. NDVI: NDVI value

What I would like to do is to perform 2 linear regressions, one for FID=1 and another one for FID=2, to compare their recovery rate. I have, though, to apply the recovery rate ONLY to NDVI values corresponding to dates after the fire (determined by DATE_FIRE) took place. In case of FID=1, I should only take rows 3, 4 and 5, as rows 1 and 2 correspond to measurements before the fire.

Furthermore, I would like to have my results as table; something like:

> desired_output
# A tibble: 2 x 4
    FID  beta    r2     p
  <dbl> <dbl> <dbl> <dbl>
1     1 0.1    1     0   
2     2 0.175  0.99  0.01

WHAT I HAVE TRIED SO FAR:

Set DATE_FIRE as years to be comparable to Date:

data$DATE_FIRE <- year(data$DATE_FIRE)

Then:

data_d <- data %>%
  group_by(FID) %>%
  filter(Date > DATE_FIRE) %>%
  do(tidy(lm(NDVI ~ Date,data)))

The grouping kind of works but not the filter. Any help will be welcome!

Upvotes: 1

Views: 64

Answers (1)

tmfmnk
tmfmnk

Reputation: 40171

One option involving dplyr, tidyr, lubridate, purrr and broom could be:

data %>%
 group_by(DATE_FIRE, FID) %>%
 filter(Date > year(DATE_FIRE)) %>%
 nest() %>%
 mutate(model = map(data, ~ tidy(lm(NDVI ~ Date, data = .))),
        r2 = map_dbl(data, ~ summary(lm(NDVI ~ Date, data = .))$r.squared)) %>%
 unnest(model)

  DATE_FIRE  FID             data term        estimate std.error statistic  p.value    r2
  <date>     <fct> <list<df[,2]>> <chr>          <dbl>     <dbl>     <dbl>    <dbl> <dbl>
1 1989-07-31 1            [3 × 2] (Intercept) -199.     5.85e-11  -3.40e12 1.87e-13 1    
2 1989-07-31 1            [3 × 2] Date           0.1    2.94e-14   3.41e12 1.87e-13 1    
3 1989-08-31 2            [3 × 2] (Intercept) -348.     2.87e+ 1  -1.21e 1 5.24e- 2 0.993
4 1989-08-31 2            [3 × 2] Date           0.175  1.44e- 2   1.21e 1 5.24e- 2 0.993

Upvotes: 1

Related Questions