Nabi Shaikh
Nabi Shaikh

Reputation: 851

Creating Data set for Change Point Detection Output performed over time series data

I have computed change point detection using cpt.mean and output of this syntax gives constant mean and respective position across times series. So here i am trying to create a data set for the same for below reproducible Example.

s <- data.frame(Tag = c(1,1,1,2,2,2,2),row = c(1,12,22,1,7,9,16),constant_mean=c(-0.12,1.55,0,0.35,1.6,0.86,0))

So here as one can observe there are three column Tag,row&constant_mean

So lets take example for only for Tag=1 for this Maximum rows would be 22 such that row 1 to row 12 the constant mean would be -0.12. and then row13 to row22 the constant mean would be 1.55. and similarly for Tag=2 this would be a group by method.As similar in image as shown below.

enter image description here

Upvotes: 1

Views: 141

Answers (2)

Jonas Lindel&#248;v
Jonas Lindel&#248;v

Reputation: 5683

A more general solution is using the mcp package to simulate change point data. Time series often involve autocorrelation, so let's include that in the mix.

library(mcp)
empty = mcp(model, sample = FALSE, par_x = "x")

df = data.frame(x = 1:200)
df$y = empty$simulate(
  df$x, 
  cp_1 = 90,  # where the change point occurs
  int_1 = -0.12,  # intercept of segment 1
  int_2 = 1.66,  # intercept of segment 2
  ar1_1 = 0.6,  # autoregressive strength
  sigma_1 = 0.5)  # standard deviations of the residuals (innovations)

Let's go ahead and use mcp to see if we can recover these parameters, also to get the default plot visualizing the data:

fit = mcp(model, df, par_x = "x")
plot(fit)

enter image description here

See the parameter estimates:

summary(fit)

Family: gaussian(link = 'identity')
Iterations: 9000 from 3 chains.
Segments:
  1: y ~ 1 + ar(1)
  2: y ~ 1 ~ 1

Population-level parameters:
    name match   sim  mean lower upper Rhat n.eff
   ar1_1    OK  0.60  0.61  0.49  0.75    1   164
    cp_1       90.00 90.04 89.01 89.99    1    15
   int_1       -0.12 -0.45 -0.75 -0.18    1   391
   int_2    OK  1.66  1.48  1.22  1.73    1  1178
 sigma_1    OK  0.50  0.50  0.45  0.55    1   681

This site contains more information about modeling time series data with mcp. Disclosure: I am the developer of mcp.

Upvotes: 0

David Arenburg
David Arenburg

Reputation: 92282

Here is a suggestion using data.table

library(data.table)
res <-
  setDT(s)[, 
          .SD[c(1L, rep(1L : (.N - 1L), diff(row)))],
          by = Tag
         ][, row := .I] # don't think this is needed

res
#     Tag row constant_mean
#  1:   1   1         -0.12
#  2:   1   2         -0.12
#  3:   1   3         -0.12
#  4:   1   4         -0.12
#  5:   1   5         -0.12
#  6:   1   6         -0.12
#  7:   1   7         -0.12
#  8:   1   8         -0.12
#  9:   1   9         -0.12
# 10:   1  10         -0.12
# 11:   1  11         -0.12
# 12:   1  12         -0.12
# 13:   1  13          1.55
# 14:   1  14          1.55
# 15:   1  15          1.55
# 16:   1  16          1.55
# 17:   1  17          1.55
# 18:   1  18          1.55
# 19:   1  19          1.55
# 20:   1  20          1.55
# 21:   1  21          1.55
# 22:   1  22          1.55
# 23:   2  23          0.35
# 24:   2  24          0.35
# 25:   2  25          0.35
# 26:   2  26          0.35
# 27:   2  27          0.35
# 28:   2  28          0.35
# 29:   2  29          0.35
# 30:   2  30          1.60
# 31:   2  31          1.60
# 32:   2  32          0.86
# 33:   2  33          0.86
# 34:   2  34          0.86
# 35:   2  35          0.86
# 36:   2  36          0.86
# 37:   2  37          0.86
# 38:   2  38          0.86

This basically expends the rows per Tag by the diff in row. I'm adding a 1 at the beginning because row in each group begins from 1 and hence the diff is always short by one. Regarding row, I wasn't sure if you just want the overall row number (which is redundant IMO) or you want it by Tag. if the later is the case, then this would be something like [, row := as.numeric(seq_len(.N)), by = Tag]

Upvotes: 3

Related Questions