Reputation: 569
I'm working on 'fixing' data in CSVs and using PowerShell because the number of rows exceeds Excel's liking and I usually also have multiple CSVs in a folder that I need to make the same updates to. I'm having trouble finding a concise way checking various columns for data and then making updates as needed. I can do it with multiple pipelines/commands, but I'd like to do it all at once and I'm also wary of doing something like $csvData = Import-CSV...
because there are A LOT of rows.
Below are the 3 commands I'm using to start with, any ideas on how to run them all at once?
Import-CSV $Filepath | %{if([string]::IsNullOrEmpty($_."Type of Admission")){$_."Type of Admission" = "9"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded
Import-CSV $Filepath | %{if([string]::IsNullOrEmpty($_."Service from Dates") -And !([string]::IsNullOrEmpty($_."Service to Dates"))){$_."Service from Dates" = $_."Service to Dates"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded
Import-CSV $Filepath | %{if(($_."Financial Class" -eq "Self-Pay")){$_."Payer" = $_."Health Plan Identification Number" = $_."Financial Class"} ; $_} | Export-CSV $NewFilepath -UseQuotes AsNeeded
One thing I'm trying to try is using the Switch cmdlet instead of a super nested If function. My idea with Switch was kind of like using a "With" statement in VBA and a Select Case that performs all true cases. So instead of Switch($_.FirstProperty)
or even If($_.FirstProperty)
could I just say Switch($_)
and then reference each property/column as .FirstProperty
? Alas I don't think it will work like that - but if it did, then I suppose my code would be something like below:
Import-CSV $Filepath | %{
Switch($_) {
[string]::IsNullOrEmpty($_."Type of Admission"){$_."Type of Admission" = "9"}
[string]::IsNullOrEmpty($_."Service from Dates") -And !([string]::IsNullOrEmpty($_."Service to Dates")){$_."Service from Dates" = $_."Service to Dates"}
($_."Financial Class" -eq "Self-Pay"){$_."Payer" = $_."Health Plan Identification Number" = $_."Financial Class"}
}
Export-CSV $NewFilepath -UseQuotes AsNeeded}
EDIT: For anyone planning to use Cpt.Whale's suggestion of keeping the data in memory instead of writing to disk each line; it works similar as you'd expect, but a little bit special with CSV data vs. your standard Get-Content/Set-Content workflow. Specifically the variable like $report
is going to need special characteristics. My recommendation at the current moment would be to use $outFile = New-Object System.Collections.ArrayList
and the .Add method to populate as you go. Some will say this is bad, but this question probably won't get too high on the list and it's fast and works.
Upvotes: 3
Views: 1025
Reputation: 5341
First, there are ways to read a file one line at a time, but they are usually unnecessary unless you're running out of memory, like gigabyte-plus-sized csv files. If that's the case, then I believe you want to use System.IO.File
accelerators like so instead of Import-CSV
:
foreach($row in [System.IO.File]::ReadLines($Filepath)) {
# Do Stuff to $row
}
There are more examples in the answers to this question, but you'll want to be careful of file stream locks, which can be a pain.
In addition, a huge time save can be made by processing all of your if
statements at the same time, instead of iterating through the entire file 3 times. For example:
# Import once if you have the memory.
$data = Import-CSV $Filepath
# Iterate once
Foreach ($row in $data) {
if ( [string]::IsNullOrEmpty($row."Type of Admission") ) {
$row."Type of Admission" = "9"
}
if ( [string]::IsNullOrEmpty($row."Service from Dates") -And -Not ([string]::IsNullOrEmpty($row."Service to Dates")) ) {
$row."Service from Dates" = $row."Service to Dates"
}
if ( $row."Financial Class" -eq "Self-Pay" ) {
$row."Payer" = $row."Health Plan Identification Number" = $row."Financial Class"
}
# export finalized row
$row | Export-CSV $NewFilepath -UseQuotes AsNeeded
}
To your last question about Switch, it only compares single values, and not objects. It's great for many things, but not here unless you have many more statements to add. It's better suited to handling one column of data like if 'Type of Admission' is 1 then A; 2 then B;
etc.
Upvotes: 2