Jason Strimpel
Jason Strimpel

Reputation: 15466

MATLAB: Identify if a value is repeated sequentially N times in a vector

I am trying to identify if a value is repeated sequentially in a vector N times. The challenge I am facing is that it could be repeated sequentially N times several times within the vector. The purpose is to determine how many times in a row certain values fall above the mean value. For example:

>> return_deltas

return_deltas = 

      7.49828129642663
      11.5098198572327
      15.1776644881294
       11.256677995536
      6.22315734182976
      8.75582103474613
      21.0488849115947
       26.132605745393
      27.0507649089989
      ...

(I only printed a few values for example but the vector is large.)

>> mean(return_deltas)

ans =

     10.50007490258002

>> sum(return_deltas > mean(return_deltas))

ans =

    50

So there are 50 instances of a value in return_deltas being greater than the mean of return_deltas.

I need to identify the number of times, sequentially, the value in return_deltas is greater than its mean 3 times in a row. In other words, if the values in return_deltas are greater than its mean 3 times in a row, that is one instance.

For example:

---------------------------------------------------------------------
| `return_delta` value | mean        | greater or less | sequence   |
|--------------------------------------------------------------------
|   7.49828129642663   |10.500074902 | LT              | 1          |
|  11.5098198572327    |10.500074902 | GT              | 1          |
|  15.1776644881294    |10.500074902 | GT              | 2          |
|   11.256677995536    |10.500074902 | GT              | 3 *        |
|  6.22315734182976    |10.500074902 | LT              | 1          |
|  8.75582103474613    |10.500074902 | LT              | 2          |
|  21.0488849115947    |10.500074902 | GT              | 1          |
|   26.132605745393    |10.500074902 | GT              | 2          |
|  27.0507649089989    |10.500074902 | GT              | 3 *        |
---------------------------------------------------------------------

The star represents a successful sequence of 3 in a row. The result of this set would be two because there were two occasions where the value was greater than the mean 3 times in a row.

What I am thinking is to create a new vector:

>> a = return_deltas > mean(return_deltas)

that of course contains ones where values in return_deltas is greater than the mean and using it to find how many times sequentially, the value in return_deltas is greater than its mean 3 times in a row. I am attempting to do this with a built in function (if there is one, I have not discovered it) or at least avoiding loops.

Any thoughts on how I might approach?

Upvotes: 0

Views: 1716

Answers (1)

Prashant Kumar
Prashant Kumar

Reputation: 22529

With a little work, this snippet finds the starting index of every run of numbers:

[0 find(diff(v) ~= 0)] + 1

An Example:

>> v = [3 3 3 4 4 4 1 2 9 9 9 9 9];           # vector of integers
>> run_starts = [0 find(diff(v) ~= 0)] + 1    # may be better to diff(v) < EPSILON, for floating-point

run_starts =

     1     4     7     8     9

To find the length of each run

>> run_lengths = [diff(run_starts), length(v) - run_starts(end) + 1]

This variables then makes it easy to query which runs were above a certain number

>> find(run_lengths >= 4)

ans =

     5

>> find(run_lengths >= 2)

ans =

     1     2     5

This tells us that the only run of at least four integers in a row was run #5.
However, there were three runs that were at least two integers in a row, specifically runs #1, #2, and #5. You can reference where each run starts from the run_starts variable.

Upvotes: 1

Related Questions