Anya Hope
Anya Hope

Reputation: 1361

Most efficient way to search large dataset for item

This has probably been asked before but I have not been able to find the same scenario.

I have been given a huge SQL dataset of a single column of strings (ie. Cities containing branches of a company).

I have also been given an even larger master dataset of strings (ie. All the cities in the world).

The first dataset changes every day. I need to run through this every single day to check whether every string can be found in the master dataset.

I have no control over the design of either dataset.

I have tried:

Foreach (cityRow in dailyDS.Tables[0].Rows)
{
     DataRow[] row = masterDs.Tables[0].Select("City like '%" + (string)cityRow["City"] + "%'");
}

and using linq (considerably faster than the select statement)

Foreach (cityRow in dailyDS.Tables[0].Rows)
{
     DataRow existingRow = masterDs.Tables[0].AsEnumerable().Where(row => row.Field<string>("City") == (string)cityRow["City"]).FirstOrDefault();
}

The entire aim of my program is to produce a list of cities where there is currently a company branch that are NOT in the master city list (ie, they have been spelt wrong etc.) I have written the program but just wanted to know:

What is the MOST efficient way to search for exact values in a dataset like this?

Upvotes: 0

Views: 1422

Answers (1)

DrKoch
DrKoch

Reputation: 9772

If you want to have it really fast, you must load all cities of the world into a HashSet<string>.

Then for each cityOfCompany check if the HashSet.Contains(cityOfCompany)

With just 23k strings to check this should run few seconds on a modern PC.

Upvotes: 2

Related Questions