Reputation: 689
I would like to detect the dates where a trend curve significantly changes using R. The red dots are the points in time where I see a significant changed, these should be detected. Small fluctuations should be ignored.
I have tried the breakpoints
functions which finds the dates indicated by the dotted lined. I don't see how these lines correlate with the data.
Example data from the chart:
structure(c(431.510725286867, 421.634186460535, 379.627938613016,
425.906255600274, -14.1367284804303, -384.10599618701, -611.193815166686,
-460.535003689942, -309.875390598749, -84.9820334889592, 217.330882967973,
437.111949107673, 738.919896124628, 752.79552200685, 804.851028725362,
757.869760812822, 1197.91301915761, 1567.88256933466, 1794.97067632374,
1644.31215300884, 1493.6528224525, 1268.75973855711, 968.432034953716,
743.503624686386, 510.63191994943), .Tsp = c(2016.66666666667,
2018.66666666667, 12), class = "ts")
Upvotes: 1
Views: 1225
Reputation: 6356
Compare the forward and backward finite difference, and filter out small values.
Explicitly: compute ∆(t) = x(t+1)-x(t) and ∇(t) = x(t)-x(t-1), then d(t) = ∆(t)-∇(t)=x(t+1)-x(t-1), and keep the t for which |d(t)| > ε, where ε captures what you call a small fluctuation.
In your case, d = c(NA, -32.1, 88.3, -486.3, 70.1, 142.9, 377.7, 0.0, 74.2, 77.4, -82.5, 82.0, -287.9, 38.2, -99.0, 487.0, -70.1, -142.9, -377.7, -0.0, -74.2, -75.4, 75.4, -7.9, NA). Which is greater, in absolute value, than ε=200 for t=c(4, 7, 13, 16, 19), exactly your 4 red dots.
Of course, the threshold of ε=200 may be chosen with more rigor (on a histogram of d the value of 200 jump in the face).
You may also want to smooth down the fluctuations by taking an average on a few points rather that the previous and next value : dn(t) = x(t+n)+ ...+x(t+1)-x(t-1)-...-x(t-n).
Upvotes: 1