Pablo Fernandez heelhook
Pablo Fernandez heelhook

Reputation: 12503

Time series segmentation

I have arrays of time series, averaging about 1000 values per array. I need to independently identify time series segments in each array.

I'm currently using the approach to calculate the mean of the array and segment items whenever the elapsed time between each item exceeds it. I couldn't find much information on standards on how to accomplish this. I'm sure there are more appropriate methods.

This is the code that I'm currently using.

def time_cluster(input)
    input.sort!
    differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
    mean = differences.mean

    clusters = []
    j = 0

    input.each_index do |i|
      j += 1 if i > 0 and differences[i-1] > mean
      (clusters[j] ||= []) << input[i]
    end

    return clusters
  end

A couple of samples from this code

time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])

Outputs

1  2  3  4  7  9, sparsity 1.3
250  254  258  270  292,  sparsity 8.4
340  345  349  371  375  382  405  407  409, sparsity 7
520  527, sparsity 3

Another array

time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])

Outputs

1  2  3  4  5  6  7  8  9  10, sparsity 0.9
1000  1020  1040  1060  1080, sparsity 16
1200

Upvotes: 4

Views: 4352

Answers (1)

dpott197
dpott197

Reputation: 1048

Use K-Means. http://ai4r.rubyforge.org/machineLearning.html

gem install ai4r

Singular Value Decomposition may also interest you. http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/

If you can't do it in Ruby, here is a great example in Python.

Unsupervised clustering with unknown number of clusters

Upvotes: 1

Related Questions