Meat
Meat

Reputation: 243

What's the best way to calculate remaining download time?

Let's say you want to calculate the remaining download time, and you have all the information needed, that is: File size, dl'ed size, size left, time elapsed, momentary dl speed, etc'. How would you calculate the remaining dl time?

Ofcourse, the straightforward way would be either: size left/momentary dl speed, or: (time elapsed/dl'ed size)*size left. Only that the first would be subject to deviations in the momentary speed, and the latter wouldn't adapt well to altering speeds.

Must be some smarter way to do that, right? Take a look at the pirated software and music you currently download with uTorrent. It's easy to notice that it does more than the simple calculation mentioned before. Actually, I notices that sometimes when the dl speed drops, the time remaining also drops for a couple of moments until it readjusts.

Upvotes: 16

Views: 11893

Answers (7)

Drew Hoskins
Drew Hoskins

Reputation: 4186

You could use an averaging algorithm where the old values decay linearly. If S_n is the speed at time n and A_{n-1} is the average at time n-1, then define your average speed as follows.

A_1 = S_1
A_2 = (S_1 + S_2)/2
A_n = S_n/(n-1) + A_{n-1}(1-1/(n-1))

In English, this means that the longer in the past a measurement occurred, the less it matters because its importance has decayed.

Compare this to the normal averaging algorithm: A_n = S_n/n + A_{n-1}(1-1/n)

You could also have it geometrically decay, which would weight the most recent speeds very heavily: A_n = S_n/2 + A_{n-1}/2

If the speeds are 4,3,5,6 then A_4 = 4.5 (simple average)
A_4 = 4.75 (linear decay)
A_4 = 5.125 (geometric decay)

Example in PHP

Beware that $n+1 (not $n) is the number of current data points due to PHP's arrays being zero-indexed. To match the above example set n == $n+1 or n-1 == $n

<?php

$s = [4,3,5,6];

// average
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        // $n+1 = number of data points so far
        $weight = 1/($n+1);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// linear decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];

    elseif ($n == 1)
        $a[$n] = ($s[$n] + $s[$n-1]) / 2;

    else
    {
        // $n = number of data points so far - 1
        $weight = 1/($n);

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);


// geometric decay
$a = [];
for ($n = 0; $n < count($s); ++$n)
{
    if ($n == 0)
        $a[$n] = $s[$n];
    else
    {
        $weight = 1/2;

        $a[$n] = $s[$n] * $weight + $a[$n-1] * (1 - $weight);
    }
}

var_dump($a);

Output

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4
  3 => float 4.5

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 4.8333333333333

array (size=4)
  0 => int 4
  1 => float 3.5
  2 => float 4.25
  3 => float 5.125

Upvotes: 11

Scott Rippey
Scott Rippey

Reputation: 15810

For anyone interested, I wrote an open-source library in C# called Progression that has a "moving-average" implementation: ETACalculator.cs.

The Progression library defines an easy-to-use structure for reporting several types of progress. It also easily handles nested progress for very smooth progress reporting.

Upvotes: 1

Meat
Meat

Reputation: 243

EDIT: Here's what I finally suggest, I tried it and it provides quite satisfying results:

I have a zero initialized array for each download speed between 0 - 500 kB/s (could be higher if you expect such speeds) in 1 kB/s steps. I sample the download speed momentarily (every second is a good interval), and increment the coresponding array item by one. Now I know how many seconds I have spent downloading the file at each speed. The sum of all these values is the elapsed time (in seconds). The sum of these values multiplied by the corresponding speed is the size downloaded so far. If I take the ratio between each value in the array and the elapsed time, assuming the speed variation pattern stabalizes, I can form a formula to predict the time each size will take. This size in this case, is the size remaining. That's what I do: I take the sum of each array item value multiplied by the corresponding speed (the index) and divided by the elapsed time. Then I divide the size left by this value, and that's the time left.

Takes a few seconds to stabalize, and then works preety damn well.

Note that this is a "complicated" average, so the method of discarding older values (moving average) might improve it even further.

Upvotes: 0

Henk Holterman
Henk Holterman

Reputation: 273621

The obvious way would be something in between, you need a 'moving average' of the download speed.

Upvotes: 7

Chad Birch
Chad Birch

Reputation: 74618

Well, as you said, using the absolutely current download speed isn't a great method, because it tends to fluctuate. However, something like an overall average isn't a great idea either, because there may be large fluctuations there as well.

Consider if I start downloading a file at the same time as 9 others. I'm only getting 10% of my normal speed, but halfway through the file, the other 9 finish. Now I'm downloading at 10x the speed I started at. My original 10% speed shouldn't be a factor in the "how much time is left?" calculation any more.

Personally, I'd probably take an average over the last 30 seconds or so, and use that. That should do calculations based on recent speed, without fluctuating wildly. 30 seconds may not be the right amount, it would take some experimentation to figure out a good amount.

Another option would be to set a sort of "fluctuation threshold", where you don't do any recalculation until the speed changes by more than that threshold. For example (random number, again, would require experimentation), you could set the threshold at 10%. Then, if you're downloading at 100kb/s, you don't recalculate the remaining time until the download speed changes to either below 90kb/s or 110kb/s. If one of those changes happens, the time is recalculated and a new threshold is set.

Upvotes: 12

CookieOfFortune
CookieOfFortune

Reputation: 14004

I think it's just an averaging algorithm. It averages the rate over a few seconds.

Upvotes: 2

&#211;lafur Waage
&#211;lafur Waage

Reputation: 70001

What you could do also is keep track of your average speed and show a calculation of that as well.

Upvotes: 1

Related Questions