Reputation: 851
I have computed change point detection using cpt.mean
and output of this syntax gives constant mean
and respective position across times series. So here i am trying to create a data set for the same for below reproducible Example.
s <- data.frame(Tag = c(1,1,1,2,2,2,2),row = c(1,12,22,1,7,9,16),constant_mean=c(-0.12,1.55,0,0.35,1.6,0.86,0))
So here as one can observe there are three column Tag
,row
&constant_mean
So lets take example for only for Tag=1
for this Maximum rows would be 22
such that row 1 to row 12
the constant mean would be -0.12. and then row13 to row22
the constant mean would be 1.55. and similarly for Tag=2 this would be a group by method
.As similar in image as shown below.
Upvotes: 1
Views: 141
Reputation: 5683
A more general solution is using the mcp
package to simulate change point data. Time series often involve autocorrelation, so let's include that in the mix.
library(mcp)
empty = mcp(model, sample = FALSE, par_x = "x")
df = data.frame(x = 1:200)
df$y = empty$simulate(
df$x,
cp_1 = 90, # where the change point occurs
int_1 = -0.12, # intercept of segment 1
int_2 = 1.66, # intercept of segment 2
ar1_1 = 0.6, # autoregressive strength
sigma_1 = 0.5) # standard deviations of the residuals (innovations)
Let's go ahead and use mcp
to see if we can recover these parameters, also to get the default plot visualizing the data:
fit = mcp(model, df, par_x = "x")
plot(fit)
See the parameter estimates:
summary(fit)
Family: gaussian(link = 'identity')
Iterations: 9000 from 3 chains.
Segments:
1: y ~ 1 + ar(1)
2: y ~ 1 ~ 1
Population-level parameters:
name match sim mean lower upper Rhat n.eff
ar1_1 OK 0.60 0.61 0.49 0.75 1 164
cp_1 90.00 90.04 89.01 89.99 1 15
int_1 -0.12 -0.45 -0.75 -0.18 1 391
int_2 OK 1.66 1.48 1.22 1.73 1 1178
sigma_1 OK 0.50 0.50 0.45 0.55 1 681
This site contains more information about modeling time series data with mcp
. Disclosure: I am the developer of mcp
.
Upvotes: 0
Reputation: 92282
Here is a suggestion using data.table
library(data.table)
res <-
setDT(s)[,
.SD[c(1L, rep(1L : (.N - 1L), diff(row)))],
by = Tag
][, row := .I] # don't think this is needed
res
# Tag row constant_mean
# 1: 1 1 -0.12
# 2: 1 2 -0.12
# 3: 1 3 -0.12
# 4: 1 4 -0.12
# 5: 1 5 -0.12
# 6: 1 6 -0.12
# 7: 1 7 -0.12
# 8: 1 8 -0.12
# 9: 1 9 -0.12
# 10: 1 10 -0.12
# 11: 1 11 -0.12
# 12: 1 12 -0.12
# 13: 1 13 1.55
# 14: 1 14 1.55
# 15: 1 15 1.55
# 16: 1 16 1.55
# 17: 1 17 1.55
# 18: 1 18 1.55
# 19: 1 19 1.55
# 20: 1 20 1.55
# 21: 1 21 1.55
# 22: 1 22 1.55
# 23: 2 23 0.35
# 24: 2 24 0.35
# 25: 2 25 0.35
# 26: 2 26 0.35
# 27: 2 27 0.35
# 28: 2 28 0.35
# 29: 2 29 0.35
# 30: 2 30 1.60
# 31: 2 31 1.60
# 32: 2 32 0.86
# 33: 2 33 0.86
# 34: 2 34 0.86
# 35: 2 35 0.86
# 36: 2 36 0.86
# 37: 2 37 0.86
# 38: 2 38 0.86
This basically expends the rows per Tag by the diff in row
. I'm adding a 1
at the beginning because row
in each group begins from 1
and hence the diff is always short by one.
Regarding row
, I wasn't sure if you just want the overall row number (which is redundant IMO) or you want it by Tag
. if the later is the case, then this would be something like [, row := as.numeric(seq_len(.N)), by = Tag]
Upvotes: 3