Maghis
Maghis

Reputation: 1103

Best way to calculate ETA of an operation?

I am looking for the best way to calculate ETA of an operation (IE: file download) using a linear progress information.

Lets say that I have the following method that gets called:

void ReportProgress(double position, double total)
{
    ...
}

I have a couple of ideas:

Upvotes: 16

Views: 16574

Answers (9)

Cees Timmerman
Cees Timmerman

Reputation: 19674

In Python (save as timeleft.py):

"""Time left calculation example, by Cees Timmerman."""

from time import sleep, time

n: int = 10
start = time()
for i in range(n):
    sleep(0.2)
    done = (1 + i) / n
    duration = time() - start
    print(
        f"Done {done * 100:>3.0f} % in {duration:>3,.2f} seconds. "
        f"Time left: {duration / done - duration:>3,.2f} seconds."
    )

Output (run with python timeleft.py):

Done  10 % in 0.21 seconds. Time left: 1.87 seconds.
Done  20 % in 0.42 seconds. Time left: 1.69 seconds.
Done  30 % in 0.64 seconds. Time left: 1.48 seconds.
Done  40 % in 0.84 seconds. Time left: 1.26 seconds.
Done  50 % in 1.04 seconds. Time left: 1.04 seconds.
Done  60 % in 1.24 seconds. Time left: 0.83 seconds.
Done  70 % in 1.46 seconds. Time left: 0.62 seconds.
Done  80 % in 1.67 seconds. Time left: 0.42 seconds.
Done  90 % in 1.87 seconds. Time left: 0.21 seconds.
Done 100 % in 2.08 seconds. Time left: 0.00 seconds.

Upvotes: -1

Enes Okullu
Enes Okullu

Reputation: 340

I created a project for calculation of ETA. You can check codes or if you have dotnet project you can use the Nuget package: CalculateETA

On linear iterations, you can measure total time passed since iteration started, and basically you can do:

(Simple calculation)
ETA = ElapsedTimeForCurrentIteration * (TotalCount - CurrentCount)

(Smoother, less volatile result)
ETA = (TotalElapsedTime / CurrentCount) * (TotalCount - CurrentCount)

If you use multi-thread process;

(Simple calculation)
ETA = (ElapsedTimeForCurrentIteration) * (TotalCount - CountOfCallingMethod)

(Smoother, less volatile result)
ETA = (TotalElapsedTime / (CountOfCallingMethod) * (TotalCount - CountOfCallingMethod))

You may also apply additional controls to prevent surges on result or graph.

GitHub: https://github.com/meokullu/CalculateETA/

NuGet (DotNet, C#): https://www.nuget.org/packages/CalculateETA/

Upvotes: 1

Luke
Luke

Reputation: 2464

I worked on project needing ETA of a long, time-consuming computation and what I ended up doing was splitting the process in batches of the same size. Then, I time how long it takes to compute each batch and the time taken is added to a FIFO list of past computation times.

The times in the list are then averaged and the resulting time is multiplied by the amount of batches remaining.

number of batches = N
size of batch = x
past computations length = l (t0,t1,...,tl)
avg time per batch = (t0 + t1 + ... + tl) / l = t
computed batches = n

ETA = t * (N - n)

Note that the list has a fixed length which should be long enough to let the estimation process "remember" and adjust to possible peaks in computation, but it should also be short enough to rapidly adapt to changes in computation speed (e.g. more computation time following the end of a competing task / more bandwidth)

Upvotes: 1

Thomas
Thomas

Reputation: 4255

Bram Cohen has talked about this a bit. He has put a lot of effort into the ETA calculations in BitTorrent (yet in a talk he mentioned that no one has yet come up to him and say "hey! great ETA calculations in bittorrent man!"). It's not a simple problem.

Some relevant links:

Upvotes: 4

Mark Ransom
Mark Ransom

Reputation: 308520

It will depend on how consistent the operation timing is. If it's consistent, it would be perfectly reasonable to use the average time of previous operations. If it's not, you're better off timing the current operation and extrapolating.

Edit: If the operation is inconsistent from previous runs, and also inconsistent from start to finish, then you have an unsolveable problem. Predicting the unpredictable is always fun :)

You might decide ahead of time if you want to underestimate or overestimate, and add a fudge factor to the estimate. For example, if you want to overestimate, and the first 10% takes 6 seconds, you might extrapolate to 60 seconds then multiply by 1.5 to get a total estimate of 90 seconds. As the percentage complete grows, decrease the fudge factor until at 100% it becomes 1.0.

Upvotes: 1

Jack Ryan
Jack Ryan

Reputation: 8472

I think that this problem is pretty much unsolvable, but it is possible to create some accurate estimations with a bit more knowledge of the process that is executing. And in the cases where there are large unknowns it is better to inform the user of those unknowns so that they can take them into account.

To take the simple example of downloading a batch of files you have two known variables:

  • The number of files
  • The size of the files

For each file there is a constant overhead (the time it takes to establish a connection, and the time it takes to open a file on the file system). There is also the obvious download time associated with the size of the files. Creating a function that can express this as time remaining in terms of the current download speed is easy, and accurate provided the downlaod speed doesnt fluctuate too much. But there lies the problem.

With an accurate model of the operation you are performing it is easy to predict how long it will take provided there are no outside influences. And that is rarely possible.

However you could go for a solution that attempts to understand and explain these outside influences. The user may find it helpful to be alerted when the speed changes dramatically as they can adjust their plans to fit with the new ETA. It may also be helpful to explain what factors are affecting the current operation. eg

Your download will complete in 6 minutes, if the download speed stays at 50k/s

This allows the user to make some educated guesses if they know that speeds are likely to change. And ultimately leads to less frustrations.

Upvotes: 4

Stringent Software
Stringent Software

Reputation: 839

If you're wanting an ETA rather than a 'progress bar' then can you supply more than one figure?

Calculate the average download speed over a set period of time (depending on how long the overall download is likely to last, if you're looking at 10+ minutes then every 5s or so would be ok) and keep a record of the averages.

Then you can provide two figures, an upper and lower estimate.

If you're confident that the averages are going to be a good indication of the total time to download, then you could display the 40th percentile and the 60th - if the average download times vary widely then the 10th and 90th might be better.

I'd rather see a ballpark '21-30 minutes' and it be accurate than be told 29 min 35.2 seconds and it be miles out, and varying wildly from one update to the next.

Upvotes: 4

James Eichele
James Eichele

Reputation: 119184

Something like this should do the trick:

void ReportProgress(double position, double total)
{
    static TimeType startTime;

    if (position == 0)
    {
        startTime = GetTime();
        return; // to avoid a divide-by-zero error
    }

    TimeType elapsedTime = GetTime() - startTime;
    TimeType estimatedRemaining = elapsedTime * total / position;
    TimeType estimatedEndTime = GetTime() + estimatedRemaining;

    // Print the results here
}

The estimate gets closer to the truth as the progress approaches 100%

Upvotes: 9

paxdiablo
paxdiablo

Reputation: 882466

I actually despise both those ideas because they've both bitten me before as a developer.

The first doesn't take into account the situation where the operation actually gets faster, it says that there's 10 minutes to go and I come back after 3 and it's finished.

The second doesn't take into account the operation getting slower - I think Windows Explorer must use this method since it always seems to take 90% of the time copying 90% of the files, then another 90% of the time copying that last 10% of the files :-).

I've long since taken to calculating both those figures and averaging them. The clients don't care (they didn't really care about the other two option either, they just want to see some progress) but it makes me feel better, and that's really all I care about at the end of the day ;-)

Upvotes: 11

Related Questions