Reputation: 12503
I have arrays of time series, averaging about 1000 values per array. I need to independently identify time series segments in each array.
I'm currently using the approach to calculate the mean of the array and segment items whenever the elapsed time between each item exceeds it. I couldn't find much information on standards on how to accomplish this. I'm sure there are more appropriate methods.
This is the code that I'm currently using.
def time_cluster(input)
input.sort!
differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
mean = differences.mean
clusters = []
j = 0
input.each_index do |i|
j += 1 if i > 0 and differences[i-1] > mean
(clusters[j] ||= []) << input[i]
end
return clusters
end
A couple of samples from this code
time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])
Outputs
1 2 3 4 7 9, sparsity 1.3
250 254 258 270 292, sparsity 8.4
340 345 349 371 375 382 405 407 409, sparsity 7
520 527, sparsity 3
Another array
time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])
Outputs
1 2 3 4 5 6 7 8 9 10, sparsity 0.9
1000 1020 1040 1060 1080, sparsity 16
1200
Upvotes: 4
Views: 4352
Reputation: 1048
Use K-Means. http://ai4r.rubyforge.org/machineLearning.html
gem install ai4r
Singular Value Decomposition may also interest you. http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
If you can't do it in Ruby, here is a great example in Python.
Unsupervised clustering with unknown number of clusters
Upvotes: 1