Reputation: 15955
In analysis of energy demand and consumption data, I'm having issue re-sampling and interpolating time series trended data.
Data set example:
timestamp value kWh
------------------ ---------
12/19/2011 5:43:21 PM 79178
12/19/2011 5:58:21 PM 79179.88
12/19/2011 6:13:21 PM 79182.13
12/19/2011 6:28:21 PM 79183.88
12/19/2011 6:43:21 PM 79185.63
Based upon these observations, I'd like some aggregation to roll-up values based upon a period of time, with that frequency set to a unit of time.
As in, intervals on the hour filling any gaps of missing data
timestamp value (approx)
------------------ ---------
12/19/2011 5:00:00 PM 79173
12/19/2011 6:00:00 PM 79179
12/19/2011 7:00:00 PM 79186
For a linear algorithm, it seems I would take the difference in time and multiply the value against that factor.
TimeSpan ts = current - previous;
Double factor = ts.TotalMinutes / period;
Value and timestamp could be calculated based upon the factor.
With such quantity of available information, I'm unsure why it's difficult to find the most elegant approach to this.
Perhaps first, are there open source analysis libraries that could be recommended?
Any recommendations for a programmatic approach? Ideally C#, or possibly with SQL?
Or, any similar questions (with answers) I could be pointed to?
Upvotes: 5
Views: 6412
Reputation: 112537
By using the time-ticks that are used internally to represent DateTimes, you get the most accurate values that are possible. Since these time ticks do not restart at zero at midnight, you will not have problems at day boundaries.
// Sample times and full hour
DateTime lastSampleTimeBeforeFullHour = new DateTime(2011, 12, 19, 17, 58, 21);
DateTime firstSampleTimeAfterFullHour = new DateTime(2011, 12, 19, 18, 13, 21);
DateTime fullHour = new DateTime(2011, 12, 19, 18, 00, 00);
// Times as ticks (most accurate time unit)
long t0 = lastSampleTimeBeforeFullHour.Ticks;
long t1 = firstSampleTimeAfterFullHour.Ticks;
long tf = fullHour.Ticks;
// Energy samples
double e0 = 79179.88; // kWh before full hour
double e1 = 79182.13; // kWh after full hour
double ef; // interpolated energy at full hour
ef = e0 + (tf - t0) * (e1 - e0) / (t1 - t0); // ==> 79180.1275 kWh
Explanation of the formula
In geometry, similar triangles are triangles that have the same shape but different sizes. The formula above is based on the fact that the ratios of any two sides in one triangle are the same for the corresponding sides of a similar triangle.
If you have a triangle A B C and a similar triangle a b c, then A : B = a : b
. The equality of two ratios is called a proportion.
We can apply this proportionality rule to our problem:
(e1 – e0) / (t1 – t0) = (ef – e0) / (tf – t0)
--- large triangle -- --- small triangle --
We can multiply both sides of the equation above by (tf – t0)
:
(tf – t0) * (e1 – e0) / (t1 – t0) = (ef – e0)
By adding e0
on both sides we can isolate ef
:
e0 + (tf – t0) * (e1 – e0) / (t1 – t0) = ef
Let's swap the two sides:
ef = e0 + (tf – t0) * (e1 – e0) / (t1 – t0)
That's it. The way we calculated the intermediate value is called "linear interpolation".
Upvotes: 11
Reputation: 10040
I have written a LINQ function to interpolate and normalize time series data so that it can be aggregated/merged.
The Resample function is as follows. I have written a short article about this technique at the Code Project.
// The function is an extension method, so it must be defined in a static class.
public static class ResampleExt
{
// Resample an input time series and create a new time series between two
// particular dates sampled at a specified time interval.
public static IEnumerable<OutputDataT> Resample<InputValueT, OutputDataT>(
// Input time series to be resampled.
this IEnumerable<InputValueT> source,
// Start date of the new time series.
DateTime startDate,
// Date at which the new time series will have ended.
DateTime endDate,
// The time interval between samples.
TimeSpan resampleInterval,
// Function that selects a date/time value from an input data point.
Func<InputValueT, DateTime> dateSelector,
// Interpolation function that produces a new interpolated data point
// at a particular time between two input data points.
Func<DateTime, InputValueT, InputValueT, double, OutputDataT> interpolator
)
{
// ... argument checking omitted ...
//
// Manually enumerate the input time series...
// This is manual because the first data point must be treated specially.
//
var e = source.GetEnumerator();
if (e.MoveNext())
{
// Initialize working date to the start date, this variable will be used to
// walk forward in time towards the end date.
var workingDate = startDate;
// Extract the first data point from the input time series.
var firstDataPoint = e.Current;
// Extract the first data point's date using the date selector.
var firstDate = dateSelector(firstDataPoint);
// Loop forward in time until we reach either the date of the first
// data point or the end date, which ever comes first.
while (workingDate < endDate && workingDate <= firstDate)
{
// Until we reach the date of the first data point,
// use the interpolation function to generate an output
// data point from the first data point.
yield return interpolator(workingDate, firstDataPoint, firstDataPoint, 0);
// Walk forward in time by the specified time period.
workingDate += resampleInterval;
}
//
// Setup current data point... we will now loop over input data points and
// interpolate between the current and next data points.
//
var curDataPoint = firstDataPoint;
var curDate = firstDate;
//
// After we have reached the first data point, loop over remaining input data points until
// either the input data points have been exhausted or we have reached the end date.
//
while (workingDate < endDate && e.MoveNext())
{
// Extract the next data point from the input time series.
var nextDataPoint = e.Current;
// Extract the next data point's date using the data selector.
var nextDate = dateSelector(nextDataPoint);
// Calculate the time span between the dates of the current and next data points.
var timeSpan = nextDate - firstDate;
// Loop forward in time until wwe have moved beyond the date of the next data point.
while (workingDate <= endDate && workingDate < nextDate)
{
// The time span from the current date to the working date.
var curTimeSpan = workingDate - curDate;
// The time between the dates as a percentage (a 0-1 value).
var timePct = curTimeSpan.TotalSeconds / timeSpan.TotalSeconds;
// Interpolate an output data point at the particular time between
// the current and next data points.
yield return interpolator(workingDate, curDataPoint, nextDataPoint, timePct);
// Walk forward in time by the specified time period.
workingDate += resampleInterval;
}
// Swap the next data point into the current data point so we can move on and continue
// the interpolation with each subsqeuent data point assuming the role of
// 'next data point' in the next iteration of this loop.
curDataPoint = nextDataPoint;
curDate = nextDate;
}
// Finally loop forward in time until we reach the end date.
while (workingDate < endDate)
{
// Interpolate an output data point generated from the last data point.
yield return interpolator(workingDate, curDataPoint, curDataPoint, 1);
// Walk forward in time by the specified time period.
workingDate += resampleInterval;
}
}
}
}
Upvotes: 7
Reputation: 18863
for what you are doing it appears that you are declaring the TimeSpan incorrectly for starters ts = (TimeSpan)(current- previous); also make sure that current and previous are of DateTime type.
if you want to look at calculating or rolling up I would look at TotalHours() here is an example that you can look at for an idea if you like here is check if a LastWrite / Modified time is within a 24 hour period
if (((TimeSpan)(DateTime.Now - fiUpdateFileFile.LastWriteTime)).TotalHours < 24){}
I know that this is different that your case but you get the drift on how to use TotalHours
Upvotes: 0
Reputation: 3191
Maby something like this:
SELECT DATE_FORMAT('%Y-%m-%d %H', timestamp) as day_hour, AVG(value) as aprox FROM table GROUP BY day_hour
What database engine you use?
Upvotes: 0