PowerShell Slowness Two Large CSV Files

Question

I have this script working, but with 100k+ rows in File1 and 200k+ in file 2, it will take days to complete. I got the where.({ part down to less than a second, with both csv files as data tables, but with that route I can't get the data out the way I want. This script outputs the data the way I want, but it takes 4 seconds per lookup. What can I do to speed this up?

I thought containskey somewhere might help, but on PRACT_ID there is a one to many relationship, so not sure how to handle those? Thx.

Invoke-Expression "C:\SHC\MSO\DataTable\functionlibrary.ps1" 
[System.Data.DataTable]$Script:MappingTable = New-Object System.Data.DataTable
$File1 = Import-csv "C:\File1.csv" -Delimiter '|' | Sort-Object PRACT_ID
$File2 = Get-Content "C:\File2.csv" | Select-Object -Skip 1 | Sort-Object 
$Script:MappingTable = $File1 | Out-DataTable

$Logs = "C:\Testing1.7.csv" 

[System.Object]$UserOutput = @()

foreach ($name in $File1) {

    [string]$userMatch = $File2.Where( { $_.Split("|")[0] -eq $name.PRACT_ID })

    if ($userMatch) {
        # Process the data

        $UserOutput += New-Object PsObject -property @{
            ID_NUMBER                = $name.ID_NUMBER
            PRACT_ID                 = $name.PRACT_ID
            LAST_NAME                = $name.LAST_NAME
            FIRST_NAME               = $name.FIRST_NAME
            MIDDLE_INITIAL           = $name.MIDDLE_INITIAL
            DEGREE                   = $name.DEGREE
            EMAILADDRESS             = $name.EMAILADDRESS
            PRIMARY_CLINIC_PHONE     = $name.PRIMARY_CLINIC_PHONE
            SPECIALTY_NAME           = $name.SPECIALTY_NAME
            State_License            = $name.State_License
            NPI_Number               = $name.NPI_Number
            'University Affiliation' = $name.'University Affiliation'
            Teaching_Title           = $name.Teaching_Title
            FACILITY                 = $userMatch
        }
    }
}


$UserOutput | Select-Object ID_NUMBER, PRACT_ID, LAST_NAME, FIRST_NAME, MIDDLE_INITIAL, DEGREE, EMAILADDRESS, PRIMARY_CLINIC_PHONE, SPECIALTY_NAME, State_License, NPI_Number, 'University Affiliation', Teaching_Title, FACILITY |
Export-Csv $logs -NoTypeInformation

Mathias R. Jessen · Accepted Answer

Load $File2 into a hashtable with the $_.Split('|')[0] value as the key - you can then also skip the object creation completely and offload everything to Select-Object:

$File2 = Get-Content "C:\File2.csv" | Select-Object -Skip 1 | Sort-Object 

# load $file2 into hashtable
$userTable = @{}
foreach($userEntry in $File2){
    $userTable[$userEntry.Split('|')[0]] = $userEntry
}

# prepare the existing property names we want to preserve
$propertiesToSelect = 'ID_NUMBER', 'PRACT_ID', 'LAST_NAME', 'FIRST_NAME', 'MIDDLE_INITIAL', 'DEGREE', 'EMAILADDRESS', 'PRIMARY_CLINIC_PHONE', 'SPECIALTY_NAME', 'State_License', 'NPI_Number', 'University Affiliation', 'Teaching_Title'

# read file, filter on existence in $userTable, add the FACILITY calculated property before export
Import-csv "C:\File1.csv" -Delimiter '|' |Where-Object {$userTable.ContainsKey($_.PRACT_ID)} |Select-Object $propertiesToSelect,@{Name='FACILITY';Expression={$userTable[$_.PRACT_ID]}} |Export-Csv $logs -NoTypeInformation

PowerShell Slowness Two Large CSV Files

Answers (2)

Related Questions