Nicholas Ibarra
Nicholas Ibarra

Reputation: 504

Quickly Split CSV line that is Comma Delimited with quotes around Strings in Powerhell

I have large CSV files that 0.5-2gb+ files I am trying to import with Powershell.

Data looks like so:

Name, Date, Value

"Joe, John", 2016-08-01, "value"

"Smith, Jane", 2016-08-01, "value" ...

I have this function

$elapsed = [System.Diagnostics.Stopwatch]::StartNew()
$reader = new-object System.IO.StreamReader($csv)

while (($line = $reader.ReadLine()) -ne $null)  {

    # Use RegEx to only split on (,) outside quotes and remove quoted strings            
    $row = ($line -split ',(?=(?:[^"]|"[^"]*")*$)').Replace("`"","")

    # Row Indicator
    $i++; 

    if (($i % 50000) -eq 0) {
        Write-Host "$i rows have been processed in $($elapsed.Elapsed.ToString())." 
    }
}

Splitting the line by a comma "," works perfect as I get ~16K a second, but I need to only split outside of any quotes, so I implemented the regular expression, however the performance tanks to 900 rows a second.

I am looking for a more efficient way to loop through a CSV file that is comma delimited but has commas in the quotes that need to be excluded.

Upvotes: 1

Views: 3367

Answers (1)

Chris Dent
Chris Dent

Reputation: 4240

Import-Csv, as noted in the comments above, does not load everything into memory unless you ask it to. Like the example in the question it implements a stream reader and pushes the content it's read off to the output pipeline.

You will see significant memory usage if you do something like this:

$var = Import-Csv thefile.csv

After all, the content of the CSV has to go somewhere.

Whereas if you do something with the output pipeline there's less impact. e.g.

Import-Csv thefile.csv | ForEach-Object {
    Do-Something
}

Finally, Import-Csv really doesn't work for you I have a CSV reader class along with a side-by-side implementation of Import-Csv called Indented.Text.Csv on github. This implementation provides a public class with a number of features I needed so I could process CSV files very quickly.

Upvotes: 3

Related Questions