Reputation: 415
I have some array (export from csv file) with ~ 100k lines . File size is ~ 22mb on disk All i need is to find line with some data, process it and load to mssql (i need to sync csv data with mssql)
Problem is that search tooks almost 1 second (~ TotalMilliseconds : 655,0788)!
$csv.Where({$_.'Device UUID' -eq 'lalala'})
Any way just to speed it up?
Upvotes: 1
Views: 3040
Reputation: 27606
Playing around with sqlite shell.
'Device UUID' | set-content file.csv
1..2200kb | % { get-random } | add-content file.csv # 1.44 sec, 25mb
'.mode csv
.import file.csv file' | sqlite3 file # 2.92 sec, 81mb
# last row
'select * from file where "device uuid" = 2143292650;' | sqlite3 file
# 'select * from file where "device uuid" > 2143292649 and "device uuid" < 2143292651;' | sqlite3 file
2143292650
(history)[-1] | % { $_.endexecutiontime - $_.startexecutiontime }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 570
Ticks : 5706795
TotalDays : 6.60508680555556E-06
TotalHours : 0.000158522083333333
TotalMinutes : 0.009511325
TotalSeconds : 0.5706795
TotalMilliseconds : 570.6795
# 34 ms after this:
# 'create index deviceindex on file("device uuid");' | sqlite3 file
# with ".timer on", it's 1ms, after the table is loaded
Upvotes: 0
Reputation: 440337
If you only need one or a few lookups, you can consider the following alternative to Mathias R. Jessen's helpful answer. Note that, like Mathias' solution, it requires reading all rows into memory at once:
# Load all rows into memory.
$allRows = Import-Csv file.csv
# Get the *index* of the row with the column value of interest.
# Note: This lookup is case-SENSITIVE.
$rowIndex = $allRows.'Device UUID'.IndexOf('lalala')
# Retrieve the row of interest by index, if found.
($rowOfInterest = if ($rowIndex -ne -1) { $allRows[$rowIndex] })
Once the rows are loaded into memory (as [pscustomobject]
instances, which itself won't be fast), the array lookup - via member-access enumeration - is reasonably fast, thanks to .NET performing the (linear) array search, using the System.Array.IndexOf()
method.
The problem with your .Where({ ... })
approach is that iteratively calling a PowerShell script block ({ ... }
) many times is computationally expensive.
It comes down to the following trade-off:
Either: Spend more time up front to build up a data structure ([hashtable]
) that allows efficient lookup (Mathias' answer)
Or: Read the file more quickly, but spend more time on each lookup (this answer).
Upvotes: 1
Reputation: 174990
Load all 100K rows into a hashtable, using the Device UUID
property as the key - this will make it much faster to locate a row than by iterating the whole array with .Where({...})
:
$deviceTable = @{}
Import-Csv .\path\to\device_list.csv |ForEach-Object {
$deviceTable[$_.'Device UUID'] = $_
}
This will now take significantly less than 1 second:
$matchingDevice = $deviceTable['lalala']
Upvotes: 5