dgorka
dgorka

Reputation: 23

Speed up looping through a dictionary in VB.Net 2008

I have a process that imports a daily file of product registrations, and adds them into our database. Originally this process would query against the database multiple times for each record to determine how to process the data.

In order to speed up this process and prevent any timeout issues that could appear to people trying to use a reporting site that uses the same database, I've changed the code to pull down a few of the tables into dictionaries, and then iterate across them to see if that customer/address/dealership exists, and pull the Id from the key of the dictionary if it does, or insert it into the table and then the dictionary if it doesn't.

However, I'm currently finding this to be running slower than if I were querying the database multiple times for each registration. One possible cause that I can think of is that my dictionaries are quite large (one has 8 million entries and another has 11 million).

Here is one example of what I'm doing:

    For Each kvp As KeyValuePair(Of Int64, String) In dCust
            If kvp.Value = firstName & "|" & lastName & "|" & companyName & "|" & addrId & "|" & typeID & "|" & phone & "|" & email Then
                custId = kvp.Key
                Exit For
            End If
    Next

This dictionary has around 11 million records in it.

An idea a coworker of mine had was to run a Dictionary.ContainsValue() before the loop to see if it's even there. And if it isn't skip the loop entirely. I'd only want to try this if that runs faster than just doing the loop itself, if they take the same time I don't see a point in basically running the loop twice.

So my questions to you are:

Upvotes: 2

Views: 3913

Answers (5)

Rob
Rob

Reputation: 1390

I think the answers about the dictionary are great, but I think the broader answer is to handle this stuff at the database tier and not download millions of records to iterate through using a dictionary in C#. Why not use a table valued parameter (I'm assuming you're using SQL Server 2008) to pass in the data you want to compare and see if it exists? You'd pass it to a stored proc or something that would do the comparison all on the SQL side. You could even do something like:

INSERT ProductRegistrations
SELECT * FROM @tvpProductsToAdd pa WHERE
pa.firstName + pa.lastName + pa.companyName NOT IN
(SELECT firstName + lastName + companyName FROM ProductRegistrations)

@tvpProductsToAdd is the table valued parameter you'd pass in with your new products. You might want to create some sort of index on those fields to speed up the comparison, given that you don't seem to have keys that you can compare.

Upvotes: 0

Mohamed Nuur
Mohamed Nuur

Reputation: 5655

The point of using a dictionary is to do a quick lookup of the KEY not value. Either just use a normal array list or change your code so that you're doing a key lookup rather than a value lookup.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500225

One obvious small optimization would be to perform the concatenation of firstName, lastName etc once outside the loop. Currently you're concatenating on every iteration of the loop, which is obviously slower than it might be.

No, using ContainsValue would be no faster - that still has to do a linear search.

The obvious big optimization would be to invert the dictionary - create a Dictionary(Of String, Int64) which basically has the ID for each string value. Currently you're not using the natural benefits of a dictionary - you're essentially treating it like a list of key/value pairs.

Do you actually use the dictionary the normal way as well (looking up by the key)?

Upvotes: 5

Will A
Will A

Reputation: 24988

It looks like you're using the Dictionary in the opposite way to how it should be used - or am I missing something?

By iterating over the key, value pairs in the dictionary, you're nullifying the benefit that a dictionary (hashtable) provides - that of quick lookup of a given key value.

You should use a dictionary of (String, Int64), mapping the firatname, lastname, ... to the custId. A lookup into this would be very quick compared to what you're currently doing.

Upvotes: 3

Chris Haas
Chris Haas

Reputation: 55417

One thing you could do to speed things up is to pre-concat that search string:

Dim SearchValue as String = firstName & "|" & lastName & "|" & companyName & "|" & addrId & "|" & typeID & "|" & phone & "|" & email
For Each kvp As KeyValuePair(Of Int64, String) In dCust
        If kvp.Value = SearchValue Then
            custId = kvp.Key
            Exit For
        End If
Next

Upvotes: 1

Related Questions