Reputation: 586
I have a process that runs every hour, as a part of the process it iterating on a text file that contains about 100K strings and it need to check if each line already exists in specific table in a SQL Server database that has about 30M records.
I have 2 options:
Option 1: SELECT
all strings from my table and load it into memory and then during the process it will check for each line in the file if it exists in the data.
Downside: It eats up the machine memory.
Option 2: check if each line in the 100K text file is found in the database (assumes table is indexed correctly).
Downside: It will require multiple requests (100K requests) to database.
Questions:
If I'm using option 2, can SQL Server handle this number of requests?
What is the preferred way in order to overcome this issue?
Upvotes: 1
Views: 205
Reputation: 46203
Below is PowerShell example code for another option; bulk insert the strings into temp table and perform the lookups as a single set-based SELECT query. I would expect this method to typically run a few seconds, depending on your infrastructure.
$connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI"
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)
# load strings from file into a DataTable
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$dataTable = New-Object System.Data.DataTable
($dataTable.Columns.Add("StringData", [System.Type]::GetType("System.String"))).MaxLength = 20
$streamReader = New-Object System.IO.StreamReader("C:\temp\temp_strings.txt")
while ($streamReader.Peek() -ge 0) {
$string = $streamReader.ReadLine()
$row = $dataTable.NewRow()
[void]$dataTable.Rows.Add($row)
$row[0] = $string
}
$streamReader.Close()
Write-Host "DataTable load completed. Duration $($timer.Elapsed.ToString())"
# bulk insert strings into temp table
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$connection.Open();
$command = New-Object System.Data.SqlClient.SqlCommand("CREATE TABLE #temp_strings(StringValue varchar(20));", $connection)
[void]$command.ExecuteNonQuery()
$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connection)
$bcp.DestinationTableName = "#temp_strings"
$bcp.WriteToServer($dataTable)
$bcp.Close()
Write-Host "BCP completed. Duration $($timer.Elapsed.ToString())"
# execute set-based lookup query and return found/notfound for each string
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$command.CommandText = @"
SELECT
strings.StringValue
, CASE
WHEN YourTable.YourTableKey IS NOT NULL THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END AS Found
FROM #temp_strings AS strings
LEFT JOIN dbo.YourTable ON strings.StringValue = YourTable.YourTableKey;
"@
$reader = $command.ExecuteReader()
while($reader.Read()) {
Write-Host "String $($reader["StringValue"]) found: $($reader["Found"])"
}
$connection.Close()
Write-Host "Lookups completed. Duration $($timer.Elapsed.ToString())"
As an alternative to bulk insert, you could alternatively pass the strings using a table-valued parameter (or XML, JSON, delimited values) for use in the query.
Upvotes: 1