Reputation: 1103
I am looking for the best way to calculate ETA of an operation (IE: file download) using a linear progress information.
Lets say that I have the following method that gets called:
void ReportProgress(double position, double total)
{
...
}
I have a couple of ideas:
Upvotes: 16
Views: 16574
Reputation: 19674
In Python (save as timeleft.py
):
"""Time left calculation example, by Cees Timmerman."""
from time import sleep, time
n: int = 10
start = time()
for i in range(n):
sleep(0.2)
done = (1 + i) / n
duration = time() - start
print(
f"Done {done * 100:>3.0f} % in {duration:>3,.2f} seconds. "
f"Time left: {duration / done - duration:>3,.2f} seconds."
)
Output (run with python timeleft.py
):
Done 10 % in 0.21 seconds. Time left: 1.87 seconds.
Done 20 % in 0.42 seconds. Time left: 1.69 seconds.
Done 30 % in 0.64 seconds. Time left: 1.48 seconds.
Done 40 % in 0.84 seconds. Time left: 1.26 seconds.
Done 50 % in 1.04 seconds. Time left: 1.04 seconds.
Done 60 % in 1.24 seconds. Time left: 0.83 seconds.
Done 70 % in 1.46 seconds. Time left: 0.62 seconds.
Done 80 % in 1.67 seconds. Time left: 0.42 seconds.
Done 90 % in 1.87 seconds. Time left: 0.21 seconds.
Done 100 % in 2.08 seconds. Time left: 0.00 seconds.
Upvotes: -1
Reputation: 340
I created a project for calculation of ETA. You can check codes or if you have dotnet project you can use the Nuget package: CalculateETA
On linear iterations, you can measure total time passed since iteration started, and basically you can do:
(Simple calculation)
ETA = ElapsedTimeForCurrentIteration * (TotalCount - CurrentCount)
(Smoother, less volatile result)
ETA = (TotalElapsedTime / CurrentCount) * (TotalCount - CurrentCount)
If you use multi-thread process;
(Simple calculation)
ETA = (ElapsedTimeForCurrentIteration) * (TotalCount - CountOfCallingMethod)
(Smoother, less volatile result)
ETA = (TotalElapsedTime / (CountOfCallingMethod) * (TotalCount - CountOfCallingMethod))
You may also apply additional controls to prevent surges on result or graph.
GitHub: https://github.com/meokullu/CalculateETA/
NuGet (DotNet, C#): https://www.nuget.org/packages/CalculateETA/
Upvotes: 1
Reputation: 2464
I worked on project needing ETA of a long, time-consuming computation and what I ended up doing was splitting the process in batches of the same size. Then, I time how long it takes to compute each batch and the time taken is added to a FIFO list of past computation times.
The times in the list are then averaged and the resulting time is multiplied by the amount of batches remaining.
number of batches = N
size of batch = x
past computations length = l (t0,t1,...,tl)
avg time per batch = (t0 + t1 + ... + tl) / l = t
computed batches = n
ETA = t * (N - n)
Note that the list has a fixed length which should be long enough to let the estimation process "remember" and adjust to possible peaks in computation, but it should also be short enough to rapidly adapt to changes in computation speed (e.g. more computation time following the end of a competing task / more bandwidth)
Upvotes: 1
Reputation: 4255
Bram Cohen has talked about this a bit. He has put a lot of effort into the ETA calculations in BitTorrent (yet in a talk he mentioned that no one has yet come up to him and say "hey! great ETA calculations in bittorrent man!"). It's not a simple problem.
Some relevant links:
Upvotes: 4
Reputation: 308520
It will depend on how consistent the operation timing is. If it's consistent, it would be perfectly reasonable to use the average time of previous operations. If it's not, you're better off timing the current operation and extrapolating.
Edit: If the operation is inconsistent from previous runs, and also inconsistent from start to finish, then you have an unsolveable problem. Predicting the unpredictable is always fun :)
You might decide ahead of time if you want to underestimate or overestimate, and add a fudge factor to the estimate. For example, if you want to overestimate, and the first 10% takes 6 seconds, you might extrapolate to 60 seconds then multiply by 1.5 to get a total estimate of 90 seconds. As the percentage complete grows, decrease the fudge factor until at 100% it becomes 1.0.
Upvotes: 1
Reputation: 8472
I think that this problem is pretty much unsolvable, but it is possible to create some accurate estimations with a bit more knowledge of the process that is executing. And in the cases where there are large unknowns it is better to inform the user of those unknowns so that they can take them into account.
To take the simple example of downloading a batch of files you have two known variables:
For each file there is a constant overhead (the time it takes to establish a connection, and the time it takes to open a file on the file system). There is also the obvious download time associated with the size of the files. Creating a function that can express this as time remaining in terms of the current download speed is easy, and accurate provided the downlaod speed doesnt fluctuate too much. But there lies the problem.
With an accurate model of the operation you are performing it is easy to predict how long it will take provided there are no outside influences. And that is rarely possible.
However you could go for a solution that attempts to understand and explain these outside influences. The user may find it helpful to be alerted when the speed changes dramatically as they can adjust their plans to fit with the new ETA. It may also be helpful to explain what factors are affecting the current operation. eg
Your download will complete in 6 minutes, if the download speed stays at 50k/s
This allows the user to make some educated guesses if they know that speeds are likely to change. And ultimately leads to less frustrations.
Upvotes: 4
Reputation: 839
If you're wanting an ETA rather than a 'progress bar' then can you supply more than one figure?
Calculate the average download speed over a set period of time (depending on how long the overall download is likely to last, if you're looking at 10+ minutes then every 5s or so would be ok) and keep a record of the averages.
Then you can provide two figures, an upper and lower estimate.
If you're confident that the averages are going to be a good indication of the total time to download, then you could display the 40th percentile and the 60th - if the average download times vary widely then the 10th and 90th might be better.
I'd rather see a ballpark '21-30 minutes' and it be accurate than be told 29 min 35.2 seconds and it be miles out, and varying wildly from one update to the next.
Upvotes: 4
Reputation: 119184
Something like this should do the trick:
void ReportProgress(double position, double total)
{
static TimeType startTime;
if (position == 0)
{
startTime = GetTime();
return; // to avoid a divide-by-zero error
}
TimeType elapsedTime = GetTime() - startTime;
TimeType estimatedRemaining = elapsedTime * total / position;
TimeType estimatedEndTime = GetTime() + estimatedRemaining;
// Print the results here
}
The estimate gets closer to the truth as the progress approaches 100%
Upvotes: 9
Reputation: 882466
I actually despise both those ideas because they've both bitten me before as a developer.
The first doesn't take into account the situation where the operation actually gets faster, it says that there's 10 minutes to go and I come back after 3 and it's finished.
The second doesn't take into account the operation getting slower - I think Windows Explorer must use this method since it always seems to take 90% of the time copying 90% of the files, then another 90% of the time copying that last 10% of the files :-).
I've long since taken to calculating both those figures and averaging them. The clients don't care (they didn't really care about the other two option either, they just want to see some progress) but it makes me feel better, and that's really all I care about at the end of the day ;-)
Upvotes: 11