Time series segmentation

Question

I have arrays of time series, averaging about 1000 values per array. I need to independently identify time series segments in each array.

I'm currently using the approach to calculate the mean of the array and segment items whenever the elapsed time between each item exceeds it. I couldn't find much information on standards on how to accomplish this. I'm sure there are more appropriate methods.

This is the code that I'm currently using.

def time_cluster(input)
    input.sort!
    differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
    mean = differences.mean

    clusters = []
    j = 0

    input.each_index do |i|
      j += 1 if i > 0 and differences[i-1] > mean
      (clusters[j] ||= []) << input[i]
    end

    return clusters
  end

A couple of samples from this code

time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])

Outputs

1  2  3  4  7  9, sparsity 1.3
250  254  258  270  292,  sparsity 8.4
340  345  349  371  375  382  405  407  409, sparsity 7
520  527, sparsity 3

Another array

time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])

Outputs

1  2  3  4  5  6  7  8  9  10, sparsity 0.9
1000  1020  1040  1060  1080, sparsity 16
1200

dpott197 · Accepted Answer

Use K-Means. http://ai4r.rubyforge.org/machineLearning.html

gem install ai4r

Singular Value Decomposition may also interest you. http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/

If you can't do it in Ruby, here is a great example in Python.

Unsupervised clustering with unknown number of clusters

Time series segmentation

Answers (1)

Related Questions