Eitanmg
Eitanmg

Reputation: 586

Load data into memory or select multiple times

I have a process that runs every hour, as a part of the process it iterating on a text file that contains about 100K strings and it need to check if each line already exists in specific table in a SQL Server database that has about 30M records.

I have 2 options:

Option 1: SELECT all strings from my table and load it into memory and then during the process it will check for each line in the file if it exists in the data.

Downside: It eats up the machine memory.

Option 2: check if each line in the 100K text file is found in the database (assumes table is indexed correctly).

Downside: It will require multiple requests (100K requests) to database.

Questions:

If I'm using option 2, can SQL Server handle this number of requests?

What is the preferred way in order to overcome this issue?

Upvotes: 1

Views: 205

Answers (1)

Dan Guzman
Dan Guzman

Reputation: 46203

Below is PowerShell example code for another option; bulk insert the strings into temp table and perform the lookups as a single set-based SELECT query. I would expect this method to typically run a few seconds, depending on your infrastructure.

$connectionString = "Data Source=.;Initial Catalog=YourDatabase;Integrated Security=SSPI"
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)

# load strings from file into a DataTable
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$dataTable = New-Object System.Data.DataTable
($dataTable.Columns.Add("StringData", [System.Type]::GetType("System.String"))).MaxLength = 20
$streamReader = New-Object System.IO.StreamReader("C:\temp\temp_strings.txt")
while ($streamReader.Peek() -ge 0) {
    $string = $streamReader.ReadLine()
    $row = $dataTable.NewRow()
    [void]$dataTable.Rows.Add($row)
    $row[0] = $string
}
$streamReader.Close()
Write-Host "DataTable load completed. Duration $($timer.Elapsed.ToString())"

# bulk insert strings into temp table
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$connection.Open();
$command = New-Object System.Data.SqlClient.SqlCommand("CREATE TABLE #temp_strings(StringValue varchar(20));", $connection)
[void]$command.ExecuteNonQuery()
$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connection)
$bcp.DestinationTableName = "#temp_strings"
$bcp.WriteToServer($dataTable)
$bcp.Close()
Write-Host "BCP completed. Duration $($timer.Elapsed.ToString())"

# execute set-based lookup query and return found/notfound for each string
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$command.CommandText = @"
SELECT
      strings.StringValue
    , CASE
        WHEN YourTable.YourTableKey IS NOT NULL THEN CAST(1 AS bit)
        ELSE CAST(0 AS bit)
      END AS Found
FROM #temp_strings AS strings
LEFT JOIN dbo.YourTable ON strings.StringValue = YourTable.YourTableKey;
"@
$reader = $command.ExecuteReader()
while($reader.Read()) {
    Write-Host "String $($reader["StringValue"]) found: $($reader["Found"])"
}
$connection.Close()
Write-Host "Lookups completed. Duration $($timer.Elapsed.ToString())"

As an alternative to bulk insert, you could alternatively pass the strings using a table-valued parameter (or XML, JSON, delimited values) for use in the query.

Upvotes: 1

Related Questions