Mark Allison
Mark Allison

Reputation: 7228

Performance tuning powershell text processing

I have a SSIS Script Task written in C# and I want it ported to powershell to be used as a script. The C# version runs in 12.1s, but the powershell version takes 100.5s almost an order of magnitude slower. I'm processing 11 text files (csv) with about 3-4 million rows in each of the format:

<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
AUDJPY,20010102,230100,64.30,64.30,64.30,64.30,4
AUDJPY,20010102,230300,64.29,64.29,64.29,64.29,4
<snip>

I want to simply write out the contents to a new file where the column has a date of 20110101 or later. Here's my C# version:

    private void ProcessFile(string fileName)
    {
        string outfile = fileName + ".processed";
        StringBuilder sb = new StringBuilder();
        using (StreamReader sr = new StreamReader(fileName))
        {
            string line;
            int year;
            while ((line = sr.ReadLine()) != null)
            {
                year = Convert.ToInt32( sr.ReadLine().Substring(7, 4));
                if (year >= 2011)
                {
                    sb.AppendLine(sr.ReadLine());
                }
            }
        }

        using (StreamWriter sw = new StreamWriter(outfile))
        {
            sw.Write(sb.ToString());
        }
    }

Here's my powershell version:

foreach($file in ls $PriceFolder\*.txt) {
    $outFile = $file.FullName + ".processed"
    $sr = New-Object System.IO.StreamReader($file)
    $sw = New-Object System.IO.StreamWriter($outFile)
    while(($line = $sr.ReadLine() -ne $null))
    {       
        if ($sr.ReadLine().SubString(7,4) -eq "2011") {$sw.WriteLine($sr.ReadLine())}
    }   
}

How can I get the same performance in powershell that I can get in my C# Script Task in SSIS?

Upvotes: 2

Views: 1208

Answers (3)

manojlds
manojlds

Reputation: 301147

You are translating the C# to Powershell which might not be ideal in all the cases. Yes, using C# will give you improved performance, but it does not mean that you cannot get comparative performance with Powershell as well.

You should try and take advantage of "streaming" in Powershell pipelines.

For example, something like:

gc file.txt | ?{ process.....} | %{process...} | out-file out.txt

Would be faster as the objects are passed along the pipeline as soon as they are available.

Can you try out an equivalent using Get-Content and the pipelining?

Upvotes: 1

stej
stej

Reputation: 29449

Some time ago I saw an question and tried to answer it - look at http://social.technet.microsoft.com/Forums/en/winserverpowershell/thread/da36e346-887f-4456-b908-5ad4ddb2daa9. Frankly, the performance penalty when using PowerShell was so huge that for time consuming tasks I would always choose either C# or Add-Type as @Roman suggested.

Upvotes: 1

Roman Kuzmin
Roman Kuzmin

Reputation: 42033

You cannot get PowerShell performance comparable to C# unless you actually use C# right in PowerShell. The Add-Type cmdlet allows to compile some usually trivial C# snippets and call them right from scripts. If performance is an issue and use of C# assemblies is not possible for some reasons then I would go this way.

See examples here: http://go.microsoft.com/fwlink/?LinkID=135195

Upvotes: 2

Related Questions