Reputation: 58
I have a stream of data (integers) with given (constant) frequency. From time to time I need to compute different averages (predefined). I am looking for solution to do it fast and efficient.
Assumptions:
What I have right now:
To give you more better insight there is a code which I have right now:
public class Buffer<T> : LinkedList<T>
{
private readonly int capacity;
public bool IsFull => Count >= capacity;
public Buffer(int capacity)
{
this.capacity = capacity;
}
public void Enqueue(T item)
{
if (Count == capacity)
{
RemoveFirst();
}
AddLast(item);
}
}
public class MovingAverage
{
private readonly Buffer<float> Buffer;
private static readonly object bufferLock = new object();
public Dictionary<string, float> Sums { get; private set; }
public Dictionary<string, int> Counts { get; private set; }
public MovingAverage(List<int> sampleCounts, List<string> names)
{
if (sampleCounts.Count != names.Count)
{
throw new ArgumentException("Wrong Moving Averages parameters");
}
Buffer = new Buffer<float>(sampleCounts.Max());
Sums = new Dictionary<string, float>();
Counts = new Dictionary<string, int>();
for (int i = 0; i < names.Count; i++)
{
Sums[names[i]] = 0;
Counts[names[i]] = sampleCounts[i];
}
}
public void ProcessAveraging(float val)
{
lock (bufferLock)
{
if (float.IsNaN(val))
{
val = 0;
}
foreach (var keyVal in Counts.OrderBy(a => a.Value))
{
Sums[keyVal.Key] += val;
if (Buffer.Count >= keyVal.Value)
{
Sums[keyVal.Key] -= Buffer.ElementAt(Buffer.Count - keyVal.Value);
}
}
Buffer.Enqueue(val);
}
}
public float GetLastAverage(string averageName)
{
lock (bufferLock)
{
if (Buffer.Count >= Counts[averageName])
{
return Sums[averageName] / Counts[averageName];
}
else
{
return Sums[averageName] / Buffer.Count;
}
}
}
}
That works really nice and is fast enough but in real world having 100 SPS doesnt really mean you will always have 100 samples in 1 second. Sometimes its 100, sometimes 99, sometimes 101. Computing these averages is critical for my system and 1 sample more or less could change a lot. Thats why I need a real timer telling me whether sample is already out of moving-average window or not.
The idea with adding timestamp to every sample seems to be promising
Upvotes: 0
Views: 1800
Reputation: 74605
Plenty of answers here.. Might as well add another one :)
This one might need some minor debugging for "off by one" etc - I didn't have a real dataset to work with so perhaps treat it as pseudocode
It's like yours: there's a buffer that is circular - give it enough capacity to hold N samples where N is enough to inspect your moving averages - 100 SPS and want to inspect 250ms I think you'll need at least 25, but we aren't short on space so you could make it more
struct Cirray
{
long _head;
TimedFloat[] _data;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public void Add(float f)
{
_data[_head++%_data.Length] = new TimedFloat() { F = f };
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
}
struct TimedFloat
{
[DllImport("Kernel32.dll", CallingConvention = CallingConvention.Winapi)]
private static extern void GetSystemTimePreciseAsFileTime(out long filetime);
private float _f;
public float F { get => _f;
set {
_f = value;
GetSystemTimePreciseAsFileTime(out long x);
T = DateTime.FromFileTimeUtc(x).Ticks;
}
}
public long T;
}
The normal DateTime.UtcNow isn't very precise - about 16ms - so it's probably no good for timestamping data like this if youre saying that even one sample could throw it off. Instead we can make it so we get the ticks equivalent of the high resolution timer, if your system supports it (if not, you might have to change system, or abuse a StopWatch class into giving a higher resolution supplement) and we're timestamping every data item.
I thought about going to the complexity of maintaining N number of constantly moving pointers to various tail ends of the data and dec/incrementing N number of sums - it could still be done (and you clearly know how) but your question read like you'd probably call for the averages infrequently enough that an N sums/counts solution would spend more time maintaining the counts than it would to just run through 250 or 500 floats every now and then and just add them up. GetAverages as a result takes an array of ticks (10 thousand per ms) of the ranges you want the data over, e.g. new[] { 50 * 10000, 100 * 10000, 150 * 10000, 200 * 10000, 250 * 10000 }
for 50ms to 250ms in steps of 50, and it starts at the current head and sums backwards until the point where it's going to break a time boundary (and this might be the off-by-one bit) whereupon it yields the average for that timespan, then resumes summing and counting (the count given by math of the start minus the current index) for the next time span.. I think I understood right that you want e.g. the "average over the last 50ms" and "average over the last 100ms", not "average for the recent 50ms" and "average for the 50ms before recent"
Edit:
Thought about it some more and did this:
struct Cirray { long _head; TimedFloat[] _data; RunningAverage[] _ravgs;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public Cirray(int capacity, int[] deltas) : this(capacity)
{
_ravgs = new RunningAverage[deltas.Length];
for (int i = 0; i < deltas.Length; i++)
_ravgs[i] = new RunningAverage() { OverMilliseconds = deltas[i] };
}
public void Add(float f)
{
//in c# every assignment returns the assigned value; capture it for use later
var addedTF = (_data[_head++ % _data.Length] = new TimedFloat() { F = f });
if (_ravgs == null)
return;
foreach (var ra in _ravgs)
{
//add the new tf to each RA
ra.Count++;
ra.Total += addedTF.F;
//move the end pointer in the RA circularly up the array, subtracting/uncounting as we go
var boundary = addedTF.T - ra.OverMilliseconds;
while (_data[ra.EndPointer].T < boundary) //while the sample is timed before the boundary, move the
{
ra.Count--;
ra.Total -= _data[ra.EndPointer].F;
ra.EndPointer = (ra.EndPointer + 1) % _data.Length; //circular indexing
}
}
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
public IEnumerable<float> GetAverages() //from the built ins
{
foreach (var ra in _ravgs)
{
if (ra.Count == 0)
yield return 0;
else
yield return (float)(ra.Total / ra.Count);
}
}
}
Absolutely haven't tested it, but it embodies my thinking in the comments
Upvotes: 1
Reputation: 2270
Instead of using a linked list I would fall back to some internal functions as array copy. In this answer I included a possible rewrite for your buffer class. Taking over the idea to keep a sum at every position.
This buffer keeps track of all the sums but in order to do that it needs to sum up every item with the new value. Based on the frequency you need to get that average it might be better to sum up when you need it and only keep the individual values.
In any way I just wanted to point out how you could do it with Array.Copy
public class BufferSum
{
private readonly int _capacity;
private readonly int _last;
private float[] _items;
public int Count { get; private set; }
public bool IsFull => Count >= _capacity;
public BufferSum(int capacity)
{
_capacity = capacity;
_last = capacity - 1;
_items = new float[_capacity];
}
public void Enqueue(float item)
{
if (Count == _capacity)
{
Array.Copy(_items, 1, _items, 0, _last);
_items[_last] = 0;
}
else
{
Count++;
}
for (var i = 0; i < Count; i ++)
{
_items[i] += item;
}
}
public float Avarage => _items[0] / Count;
public float AverageAt(int ms, int fps)
{
var _pos = Convert.ToInt32(ms / 1000 * fps);
return _items[Count - _pos] / _pos;
}
}
Additional be careful with the lock statement that will take a lot of time to.
Upvotes: 1
Reputation: 2604
Compute a series of cumulative sums1 for every block of ~1000 or so elements. (Could be less however 500 or 1000 is not that much of a difference and this will be more comfortable) You want to hold every block as long as at least one element inside is relevant. Then it can be recycled.2
When you need your current sum and you are within one block, your desired sum is:block[max_index] - block[last_relevant_number]
.
For the case when you are at the borderline of two blocks b1, b2 in this order, your desired sum is:
b1[b1.length - 1] - b1[last_relevant_number] + b2[max_index]
And we are done. The main advantage of this approach is that you don't need to know beforehands how many elements you want to keep and you can compute the result on the go.
You also don't need to handle the removal of the elements as you will naturally overwrite them when you recycle the segment - keeping the indices is all you need.
Example: let us have a constant timeseries ts = [1,1,1, .... 1]
. The cumulative sums of the series will be cumsum = [1,2,3 ... n]
. The sum from i-th to the j-th(inclusive) element of the ts will be cumsum[j] - cumsum[i - 1] = j - i - 1. For i = 5, j = 6 it will be 6 - 4 = 2 which is correct.
1 For array [1,2,3,4,5] these would be [1,3,6,10,15] - just for the sake of completeness.
2 Since you mentioned ~500 elements, two blocks should be enough.
Upvotes: 0
Reputation: 30464
You always want to remove the oldest element on one side of your sequence and add a new element at the other side of the sequence: you need a queue instead of a stack.
I think a round list will be faster: as long as you have not the maximum size, just add the elements, once you've got the maximum size, replace the oldest element.
This seems like a nice reusable class. Later we'll add the moving average part.
class RoundArray<T>
{
public RoundArray(int maxSize)
{
this.MaxSize = maxSize;
this.roundArray = new List<T>(maxSize);
}
private readonly int maxSize;
private readonly List<T> roundArray;
public int indexOldestItem = 0;
public void Add(T item)
{
// if list not full, just add
if (this.roundArray.Count < this.maxSize)
this.roundArray.Add(item);
else
{
// list is full, replace the oldest item:
this.roundArray[oldestItem] = item;
oldestItem = (oldestItem + 1) % this.maxSize;
}
public int Count => this.roundArray.Count;
public T Oldest => this.roundArray[this.indexOldestItem];
}
}
To make this class useful, add methods to enumerate the data, starting at the oldest or the newest, consider to add other useful reusable methods. Maybe you should implement IReadOnlyCollection<T>
. Maybe some private fields should have public properties.
Your moving average calculator will use this RoundArray. Whenever an item is added, and your roundArray is not full yet, the item is added to the sum and to the round array.
If the roundArray is full, then the item replaces the oldest item. You subtract the value of the OldestItem from the Sum, and add the new Item to the Sum.
class MovingAverageCalculator
{
public MovingAverageCalculator(int maxSize)
{
this.roundArray = new RoundArray<int>(maxSize);
}
private readonly RoundArray<int> roundArray;
private int sum = 0;
private int Count => this.RoundArray.Count;
private int Average => this.sum / this.Count;
public voidAdd(int value)
{
if (this.Count == this.MaxSize)
{
// replace: remove the oldest value from the sum and add the new one
this.Sum += value - this.RoundArray.Oldest;
}
else
{
// still building: just add the new value to the Sum
this.Sum += value;
}
this.RoundArray.Add(value);
}
}
Upvotes: 0
Reputation: 80187
Make an array of size 500, int counter c
.
For every sample:
summ -= A[c % 500] //remove old value
summ += sample
A[c % 500] = sample //replace it with new value
c++
if needed, calculate
average = summ / 500
Upvotes: 0