Reputation: 15466
I am trying to identify if a value is repeated sequentially in a vector N times. The challenge I am facing is that it could be repeated sequentially N times several times within the vector. The purpose is to determine how many times in a row certain values fall above the mean value. For example:
>> return_deltas
return_deltas =
7.49828129642663
11.5098198572327
15.1776644881294
11.256677995536
6.22315734182976
8.75582103474613
21.0488849115947
26.132605745393
27.0507649089989
...
(I only printed a few values for example but the vector is large.)
>> mean(return_deltas)
ans =
10.50007490258002
>> sum(return_deltas > mean(return_deltas))
ans =
50
So there are 50 instances of a value in return_deltas
being greater than the mean of return_deltas
.
I need to identify the number of times, sequentially, the value in return_deltas
is greater than its mean 3 times in a row. In other words, if the values in return_deltas
are greater than its mean 3 times in a row, that is one instance.
For example:
---------------------------------------------------------------------
| `return_delta` value | mean | greater or less | sequence |
|--------------------------------------------------------------------
| 7.49828129642663 |10.500074902 | LT | 1 |
| 11.5098198572327 |10.500074902 | GT | 1 |
| 15.1776644881294 |10.500074902 | GT | 2 |
| 11.256677995536 |10.500074902 | GT | 3 * |
| 6.22315734182976 |10.500074902 | LT | 1 |
| 8.75582103474613 |10.500074902 | LT | 2 |
| 21.0488849115947 |10.500074902 | GT | 1 |
| 26.132605745393 |10.500074902 | GT | 2 |
| 27.0507649089989 |10.500074902 | GT | 3 * |
---------------------------------------------------------------------
The star represents a successful sequence of 3 in a row. The result of this set would be two because there were two occasions where the value was greater than the mean 3 times in a row.
What I am thinking is to create a new vector:
>> a = return_deltas > mean(return_deltas)
that of course contains ones where values in return_deltas
is greater than the mean and using it to find how many times sequentially, the value in return_deltas
is greater than its mean 3 times in a row. I am attempting to do this with a built in function (if there is one, I have not discovered it) or at least avoiding loops.
Any thoughts on how I might approach?
Upvotes: 0
Views: 1716
Reputation: 22529
With a little work, this snippet finds the starting index of every run of numbers:
[0 find(diff(v) ~= 0)] + 1
An Example:
>> v = [3 3 3 4 4 4 1 2 9 9 9 9 9]; # vector of integers
>> run_starts = [0 find(diff(v) ~= 0)] + 1 # may be better to diff(v) < EPSILON, for floating-point
run_starts =
1 4 7 8 9
To find the length of each run
>> run_lengths = [diff(run_starts), length(v) - run_starts(end) + 1]
This variables then makes it easy to query which runs were above a certain number
>> find(run_lengths >= 4)
ans =
5
>> find(run_lengths >= 2)
ans =
1 2 5
This tells us that the only run of at least four integers in a row was run #5.
However, there were three runs that were at least two integers in a row, specifically runs #1, #2, and #5.
You can reference where each run starts from the run_starts
variable.
Upvotes: 1