Creating Data set for Change Point Detection Output performed over time series data

Question

I have computed change point detection using cpt.mean and output of this syntax gives constant mean and respective position across times series. So here i am trying to create a data set for the same for below reproducible Example.

s <- data.frame(Tag = c(1,1,1,2,2,2,2),row = c(1,12,22,1,7,9,16),constant_mean=c(-0.12,1.55,0,0.35,1.6,0.86,0))

So here as one can observe there are three column Tag,row&constant_mean

So lets take example for only for Tag=1 for this Maximum rows would be 22 such that row 1 to row 12 the constant mean would be -0.12. and then row13 to row22 the constant mean would be 1.55. and similarly for Tag=2 this would be a group by method.As similar in image as shown below.

David Arenburg · Accepted Answer

Here is a suggestion using data.table

library(data.table)
res <-
  setDT(s)[, 
          .SD[c(1L, rep(1L : (.N - 1L), diff(row)))],
          by = Tag
         ][, row := .I] # don't think this is needed

res
#     Tag row constant_mean
#  1:   1   1         -0.12
#  2:   1   2         -0.12
#  3:   1   3         -0.12
#  4:   1   4         -0.12
#  5:   1   5         -0.12
#  6:   1   6         -0.12
#  7:   1   7         -0.12
#  8:   1   8         -0.12
#  9:   1   9         -0.12
# 10:   1  10         -0.12
# 11:   1  11         -0.12
# 12:   1  12         -0.12
# 13:   1  13          1.55
# 14:   1  14          1.55
# 15:   1  15          1.55
# 16:   1  16          1.55
# 17:   1  17          1.55
# 18:   1  18          1.55
# 19:   1  19          1.55
# 20:   1  20          1.55
# 21:   1  21          1.55
# 22:   1  22          1.55
# 23:   2  23          0.35
# 24:   2  24          0.35
# 25:   2  25          0.35
# 26:   2  26          0.35
# 27:   2  27          0.35
# 28:   2  28          0.35
# 29:   2  29          0.35
# 30:   2  30          1.60
# 31:   2  31          1.60
# 32:   2  32          0.86
# 33:   2  33          0.86
# 34:   2  34          0.86
# 35:   2  35          0.86
# 36:   2  36          0.86
# 37:   2  37          0.86
# 38:   2  38          0.86

This basically expends the rows per Tag by the diff in row. I'm adding a 1 at the beginning because row in each group begins from 1 and hence the diff is always short by one. Regarding row, I wasn't sure if you just want the overall row number (which is redundant IMO) or you want it by Tag. if the later is the case, then this would be something like [, row := as.numeric(seq_len(.N)), by = Tag]

Creating Data set for Change Point Detection Output performed over time series data

Answers (2)

Related Questions