Joey
Joey

Reputation: 354794

Weird Powershell performance issue

NOTE: This is a solution for Project Euler Problem 14. If you still want to solve it yourself then don't read on.

The problem was to find the number below one million which, as starting number for the Collatz sequence, produces the longest such sequence. My initial code was as follows:

$r = @{}

for($i = 1; $i -lt 1000000; $i++) {
    $n = 0
    $a = $i
    while ($a -gt 1) {
        if ($r[$a]) {
            $n += $r[$a]
            break
        }
        if ($a % 2 -eq 0) {
            $a /= 2
        } else {
            $a = $a * 3 + 1
        }
        $n++
    }
    $r[$i] = $n
}

$r.GetEnumerator() | sort Value | select -l 1 | %{$_.Key}

which tries to use a hashtable as cache for sub-sequences already encountered to save time. I was quite surprised when that script had a runtime of over eight minutes on my machine. Re-creating the same code in C#:

using System;
using System.Collections.Generic;

class Problem14
{
    public static void Main()
    {
        var ht = new Dictionary<long, int>();

        for (int i = 1; i < 1000000; i++)
        {
            int count = 0;
            long j = i;

            while (j > 1)
            {
                if (ht.ContainsKey(j))
                {
                    count += ht[j];
                    break;
                }
                if (j % 2 == 0)
                    j /= 2;
                else
                    j = 3 * j + 1;

                count++;
            }
            ht[i] = count;
        }

        KeyValuePair<long, int> max = new KeyValuePair<long, int>();
        foreach (var n in ht)
        {
            if (n.Value > max.Value)
                max = n;
        }
        Console.WriteLine(max.Key);
    }
}

had a runtime of just over one second. I knew that speed of execution wasn't a prime goal in Powershell. It's an adminstration language and for those tasks the ratio of PS code to cmdlets is likely very different from what I do here.

Still, I don't know what exactly causes the slowdown.

Suspecting the hash table, I replaced it for caching with an array. This resulted in an execution time around 200 ms in C# and about 32 minutes in Powershell. Code was as follows:

$r = ,0*1000000

for($i = 1; $i -lt 1000000; $i++) {
    $n = 0
    $a = $i
    while ($a -gt 1) {
        if ($r[$a]) {
            $n += $r[$a]
            break
        }
        if ($a % 2 -eq 0) {
            $a /= 2
        } else {
            $a = $a * 3 + 1
        }
        $n++
    }
    if ($i -lt 1000000) {
        $r[$i] = $n
    }
}

$max = 0
for($i=1; $i -lt 1000000; $i++) {
    if ($r[$i] > $r[$max]) {
        $max = $i
    }
}
$max

and

using System;

class Problem14
{
    public static void Main()
    {
        var cache = new int[1000000];

        for (int i = 1; i < 1000000; i++)
        {
            int count = 0;
            long j = i;

            while (j > 1)
            {
                if (j < 1000000 && cache[j] != 0)
                {
                    count += cache[j];
                    break;
                }
                if (j % 2 == 0)
                    j /= 2;
                else
                    j = 3 * j + 1;

                count++;
            }
            cache[i] = count;
        }

        var max = 0;
        for (int i = 1; i < cache.Length; i++)
        {
            if (cache[i] > cache[max])
                max = i;
        }

        Console.WriteLine(max);
    }
}

A completely cacheless variant was around 1.2 seconds in C#. Didn't try yet in Powershell.

Any ideas?

Upvotes: 1

Views: 730

Answers (1)

x0n
x0n

Reputation: 52480

First and foremost, PowerShell is an interpreted language (not jitted, nor compiled). That's always going to hurt. ;-)

Secondly, there are some language constructs that you may use that can avoid multiple interpreted steps. For example, instead of using a for(;;) statement, use a range:

0..1000000 | foreach-object { foo $_ 

Might help a bit.

Most importantly, avoid the excessive use of break and continue keywords in loops - invert your logic if possible to avoid this. Internally, these keywords signal using .NET exceptions, and are therefore expensive operations.

Hope this helps your understanding somewhat.

EDIT: these keywords use .NET exceptions for signalling

From System.Management.Automation.FlowControlNode.Execute(...):

switch (this._tokenId)
    {
        case TokenId.ExitToken:
        {
            int exitCode = this.GetExitCode(result);
            throw new ExitException(base.NodeToken, exitCode);
        }
        case TokenId.ReturnToken:
            throw new ReturnException(base.NodeToken, result);

        case TokenId.BreakToken:
            label = this.GetLabel(result, context);
            throw new BreakException(base.NodeToken, label);

        case TokenId.ContinueToken:
            label = this.GetLabel(result, context);
            throw new ContinueException(base.NodeToken, label);

-Oisin

Upvotes: 3

Related Questions