Yong Cai
Yong Cai

Reputation: 147

Import-Csv slow performance

I have plenty of CSV files which all are large in sizes, and I used PowerShell to create a script to check any miscount columns of every rows in the records. Previously I asked a few related question on the Get-Content, Import-Csv, Excel COM API.

Count CSV columns without using Excel API

Powershell get-content ignore quoted delimiter

I using Import-Csv as suggested and I noted the performance still relatively very slow. Averagely it took around 20 minutes to execute on 50k rows of records.

Here is the testing code snippet of Import-Csv:

$path = "testing.csv"
$headerlimit = @(1..15)
$result = Import-Csv $path -Header $headerlimit |
          ForEach {@($_.PSObject.Properties| Where {$_.Value -ne $Null}).Count}
$i = 0
$line = 1
foreach ($loop in $result) {
    if ($result[$i] -ne <header columns count value here>) {
        echo  "Line: $line "
        echo "Column Found:" $result[$i]
    }
    $i++
    $line++
}

So is that the problem source is from my looping statement caused the slowing in performance or PowerShell is slower in performance compared to other programming language such as C# when execute this kind of task?

Reference:

Why cant PowerShell run loops fast ?

Update: In my previous script, I log simple details as follow(sample log of final result what am i trying to achieved):

  > ...................#1 File Name: book2.csv, Header Cell Count: 6 ...........................

    > Row No    Column No           Result 
        > 3        9            Invalid Column Count: Found mismatch no of header column vs data column 
        > 5        6            No Data Found: No data found on this row
> Total Rows Found(Exclude Header): (5)
> Total Maximum Columns Found: (9)

I wish to check on the data to ensure the csv properly exported from application database or from user manual insert the records. Hence the file size will very huge and may contain errors on it.

Upvotes: 0

Views: 1684

Answers (1)

iRon
iRon

Reputation: 23613

I completely agree with the comment from Ansgar Wiechers; Questions arise as:
Do you really need line numbers or actually the items at the concerned lines?
I have the feeling that you letting us help you to reinvent the wheel.

Anyways, to answer your question: PowerShell has a lot of cmdlets and commands, in a lot of cases you can do something with just a single command. Especially where performance is an issue, it is better to investigate in finding the right command then trying to program something yourself (and reinvent the wheel that performs less then the native solution).

In this specific case, use the Where-Object cmdlet, you can simply retrieve the concerned rows that are not aligned at once:

Import-Csv $path -Header $headerlimit | 
    Where {@($_.PSObject.Properties| Where {$_.Value -ne $Null}).Count -ne <header columns count value here>}

Upvotes: 2

Related Questions