Reputation: 4147
I have a PowerShell script where I read in a CSV file, and if the date in a certain column is greater than a parameter date, I output that row to a new file.
As of now, I read the CSV file and then pipe to a ForEach-Object
where if the row "passes" I store it in an Arraylist. Then when all the rows are processed, I output the Arraylist to an output CSV file. My starting CSV file is 225MB with over a quarter million rows, meaning that this process is slow.
Is there a way I can add a filter function to my piping so that only the passing rows are passed to the output CSV in one fell swoop? The current Where-Object
just uses things like -like
, -contains
... and not more complex forms of evaluation.
For best practices, I've got my code below:
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip(1) |
ForEach-Object {
#Skip the header
if( $lineCounter -eq 1)
{
return
}
$newDate = if ([string]::IsNullOrEmpty($_.1) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.1,”yyyyMMdd”,$null) }
$updateDate = if ([string]::IsNullOrEmpty($_.2) -eq $true)
{ [DateTime]::MinValue }
else { [datetime]::ParseExact($_.2,”yyyyMMdd”,$null) }
$distanceDate = (Get-Date).AddDays($daysBack * -1)
if( $newDate -gt $distanceDate -or $updateDate -gt $distanceDate )
{
[void]$filteredArrayList.Add($_)
}
}
...
$filteredArrayList |
ConvertTo-Csv -Delimiter "`t" -NoTypeInformation |
select -Skip 1 |
% { $_ -replace '"', ""} |
out-file $ouputFile -fo -en unicode -Append
Upvotes: 1
Views: 161
Reputation: 59021
Sure, just add a function that takes a value from the pipeline and pipe the result of Import-Csv to it. Within the function you check whether you want to filter the current item or not. Here a simple example which uses a string list and filter all strings that starts with h
:
$x = @('hello', 'world', 'hello', 'tree')
filter Filter-CsvByMyRequirements
{
Param(
[Parameter(Mandatory=$true,
ValueFromPipeline=$true)]
$InputObject
)
Process
{
if ($_ -match '^h.*')
{
$_
}
}
}
$x | Filter-CsvByMyRequirements | Write-Host
Output:
hello
hello
Upvotes: 0
Reputation: 4260
I've added ConvertToDate as a function to stop that confusing the Where block.
DistanceDate is out because it appears to be calculated only once.
ExportCsv is a little function that writes pipeline input to a file.
I haven't tested it, so bugs are quite likely unless I got lucky.
function ConvertToDate {
param(
[String]$DateString
)
if ($DateString -eq '') {
return [DateTime]::MinValue
} else {
return [DateTime]::ParseExact($DateString, ”yyyyMMdd”, $null)
}
}
filter ExportCsv {
param(
[Parameter(Position = 1)]
[String]$Path
)
$csv = $_ | ConvertTo-Csv -Delimiter "`t" | Select-Object -Last 1
$csv -replace '"' | Out-File $Path -Append -Encoding Unicode -Force
}
$distanceDate = (Get-Date).AddDays($daysBack * -1)
Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
Select-Object -Skip 1 |
Where-Object { (ConvertToDate $_.1) -gt $distanceDate -or (ConvertToDate $_.2) -gt $distanceDate } |
ExportCsv $OutputFile
Upvotes: 2