Black Dynamite
Black Dynamite

Reputation: 4147

Where-Object with complex evaluation

I have a PowerShell script where I read in a CSV file, and if the date in a certain column is greater than a parameter date, I output that row to a new file.

As of now, I read the CSV file and then pipe to a ForEach-Object where if the row "passes" I store it in an Arraylist. Then when all the rows are processed, I output the Arraylist to an output CSV file. My starting CSV file is 225MB with over a quarter million rows, meaning that this process is slow.

Is there a way I can add a filter function to my piping so that only the passing rows are passed to the output CSV in one fell swoop? The current Where-Object just uses things like -like, -contains... and not more complex forms of evaluation.

For best practices, I've got my code below:

Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
    Select-Object  -Skip(1)  | 
    ForEach-Object {
        #Skip the header
        if( $lineCounter -eq 1)
        {
            return
        }

        $newDate = if ([string]::IsNullOrEmpty($_.1) -eq $true) 
                        { [DateTime]::MinValue } 
                        else { [datetime]::ParseExact($_.1,”yyyyMMdd”,$null) }

        $updateDate = if ([string]::IsNullOrEmpty($_.2) -eq $true) 
                        { [DateTime]::MinValue } 
                        else { [datetime]::ParseExact($_.2,”yyyyMMdd”,$null) }

        $distanceDate = (Get-Date).AddDays($daysBack * -1)

        if( $newDate -gt $distanceDate -or $updateDate -gt $distanceDate  )
        {
            [void]$filteredArrayList.Add($_)
        }
    }
    ...
    $filteredArrayList |
    ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | 
    select -Skip 1 |
    % { $_ -replace '"', ""} |
    out-file $ouputFile -fo -en unicode -Append

Upvotes: 1

Views: 161

Answers (2)

Martin Brandl
Martin Brandl

Reputation: 59021

Sure, just add a function that takes a value from the pipeline and pipe the result of Import-Csv to it. Within the function you check whether you want to filter the current item or not. Here a simple example which uses a string list and filter all strings that starts with h:

$x = @('hello', 'world', 'hello', 'tree')


filter Filter-CsvByMyRequirements
{
    Param(
        [Parameter(Mandatory=$true, 
                   ValueFromPipeline=$true)]
        $InputObject
    )

    Process
    {
        if ($_ -match '^h.*')
        {
            $_
        }
    }
}


$x | Filter-CsvByMyRequirements | Write-Host

Output:

hello
hello

Upvotes: 0

Chris Dent
Chris Dent

Reputation: 4260

I've added ConvertToDate as a function to stop that confusing the Where block.

DistanceDate is out because it appears to be calculated only once.

ExportCsv is a little function that writes pipeline input to a file.

I haven't tested it, so bugs are quite likely unless I got lucky.

function ConvertToDate {
    param(
        [String]$DateString
    )

    if ($DateString -eq '') {
        return [DateTime]::MinValue
    } else {
        return [DateTime]::ParseExact($DateString, ”yyyyMMdd”, $null)
    }
}

filter ExportCsv {
    param(
        [Parameter(Position = 1)]
        [String]$Path
    )

    $csv = $_ | ConvertTo-Csv -Delimiter "`t" | Select-Object -Last 1
    $csv -replace '"' | Out-File $Path -Append -Encoding Unicode -Force
}

$distanceDate = (Get-Date).AddDays($daysBack * -1)

Import-Csv -Delimiter "`t" -Header $headerCounter -Path $filePath |
    Select-Object  -Skip 1  |
    Where-Object { (ConvertToDate $_.1) -gt $distanceDate -or (ConvertToDate $_.2) -gt $distanceDate } |
    ExportCsv $OutputFile

Upvotes: 2

Related Questions