Reputation: 7228
I have a SSIS Script Task written in C# and I want it ported to powershell to be used as a script. The C# version runs in 12.1s, but the powershell version takes 100.5s almost an order of magnitude slower. I'm processing 11 text files (csv) with about 3-4 million rows in each of the format:
<TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
AUDJPY,20010102,230100,64.30,64.30,64.30,64.30,4
AUDJPY,20010102,230300,64.29,64.29,64.29,64.29,4
<snip>
I want to simply write out the contents to a new file where the column has a date of 20110101 or later. Here's my C# version:
private void ProcessFile(string fileName)
{
string outfile = fileName + ".processed";
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(fileName))
{
string line;
int year;
while ((line = sr.ReadLine()) != null)
{
year = Convert.ToInt32( sr.ReadLine().Substring(7, 4));
if (year >= 2011)
{
sb.AppendLine(sr.ReadLine());
}
}
}
using (StreamWriter sw = new StreamWriter(outfile))
{
sw.Write(sb.ToString());
}
}
Here's my powershell version:
foreach($file in ls $PriceFolder\*.txt) {
$outFile = $file.FullName + ".processed"
$sr = New-Object System.IO.StreamReader($file)
$sw = New-Object System.IO.StreamWriter($outFile)
while(($line = $sr.ReadLine() -ne $null))
{
if ($sr.ReadLine().SubString(7,4) -eq "2011") {$sw.WriteLine($sr.ReadLine())}
}
}
How can I get the same performance in powershell that I can get in my C# Script Task in SSIS?
Upvotes: 2
Views: 1208
Reputation: 301147
You are translating the C# to Powershell which might not be ideal in all the cases. Yes, using C# will give you improved performance, but it does not mean that you cannot get comparative performance with Powershell as well.
You should try and take advantage of "streaming" in Powershell pipelines.
For example, something like:
gc file.txt | ?{ process.....} | %{process...} | out-file out.txt
Would be faster as the objects are passed along the pipeline as soon as they are available.
Can you try out an equivalent using Get-Content
and the pipelining?
Upvotes: 1
Reputation: 29449
Some time ago I saw an question and tried to answer it - look at http://social.technet.microsoft.com/Forums/en/winserverpowershell/thread/da36e346-887f-4456-b908-5ad4ddb2daa9. Frankly, the performance penalty when using PowerShell was so huge that for time consuming tasks I would always choose either C# or Add-Type
as @Roman suggested.
Upvotes: 1
Reputation: 42033
You cannot get PowerShell performance comparable to C# unless you actually use C# right in PowerShell. The Add-Type
cmdlet allows to compile some usually trivial C# snippets and call them right from scripts. If performance is an issue and use of C# assemblies is not possible for some reasons then I would go this way.
See examples here: http://go.microsoft.com/fwlink/?LinkID=135195
Upvotes: 2