Amrit
Amrit

Reputation: 53

Calculate trend at each points in time series data in R

I have an annual time-series data of the production of maize from 1979 to 2020. Sample data is like this

year production

1979  1061
1980  1900
1981  1701
1982  1180
.
.
.
2020  1245

Now I need to calculate the trend value for each year to separate the trend component of production. How can I do it in R? Can we do it using a linear regression model as literatures says that maize production can be separated into the trend yield, climate yield, and random error as follows:

Y=Yt+Yc+ε

where Y is the maize production Yt is the trend yield, Yc is the climate yield and ε is the yield component affected by other random factors and can be ignored.

I need to separate climate yield from total production.

Thank you in advance for your help :)

Upvotes: 0

Views: 837

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269854

1) linear regression Assuming

  1. the input data frame dd shown reproducibly in the Note at the end (which with 4 points is not really enough but we use what we have)
  2. linear regression is to be used as stated in the question
  3. y, yc, yt and e in the question are the yield (production), the mean yield, the trend effect and the residuals respectively from a linear regression of production on year

we run the regression using lm and then get the decomposition using proj. No packages are used.

fm <- lm(production ~ year, dd)
p <- proj(fm)

# check that components sum to yield
all.equal(dd$production, rowSums(p), check.attributes = FALSE)
## [1] TRUE

tt <- ts(cbind(dd$production, p), start = dd$year[1])
colnames(tt) <- c("y", "yc", "yt", "e")
tt
## Time Series:
## Start = 1979 
## End = 1982 
## Frequency = 1 
##         y     yc    yt      e
## 1979 1061 1460.5 -23.7 -375.8
## 1980 1900 1460.5  -7.9  447.4
## 1981 1701 1460.5   7.9  232.6
## 1982 1180 1460.5  23.7 -304.2

# plot
plot(tt, main = "yield and components")

(continued after image)

screenshot

2) HP filter Another approach is to define yc to be mean yield as above but use Hodrick and Prescott filter output to define the trend.

(There are other possibilities too such as running an HP filter on the residuals of the linear regression and then defining the HP trend as yc giving four components: mean, yt, yc and e or possibly combining the mean with one of the other components; however, in the absence of a specific definition of what is actually wanted we won't pursue the many possibilites.)

library(mFilter)

y <- with(dd, ts(production, start = year[1]))
yc <- mean(y)
yt <- hpfilter(y - yc)$trend
e <- y - yc - yt

tt2 <- cbind(y, yc, yt, e); tt2
## Time Series:
## Start = 1979 
## End = 1982 
## Frequency = 1 
##         y     yc         yt         e
## 1979 1061 1460.5 -50.440731 -349.0593
## 1980 1900 1460.5  20.014502  419.4855
## 1981 1701 1460.5  32.293190  208.2068
## 1982 1180 1460.5  -1.866961 -278.6330

plot(tt2, main = "yield and HP components")

screenshot

Note

dd <- structure(list(year = 1979:1982, production = c(1061L, 1900L, 
1701L, 1180L)), class = "data.frame", row.names = c(NA, -4L))

Update

Have made some improvements and added second approach.

Upvotes: 1

Brian Montgomery
Brian Montgomery

Reputation: 2414

You only have one independent variable: year So the best you can do is Y=Yt+ε
And that can be done with lm(production ~ year, data = data)

Upvotes: 0

Related Questions