Reputation: 1361
This has probably been asked before but I have not been able to find the same scenario.
I have been given a huge SQL dataset of a single column of strings (ie. Cities containing branches of a company).
I have also been given an even larger master dataset of strings (ie. All the cities in the world).
The first dataset changes every day. I need to run through this every single day to check whether every string can be found in the master dataset.
I have no control over the design of either dataset.
I have tried:
Foreach (cityRow in dailyDS.Tables[0].Rows)
{
DataRow[] row = masterDs.Tables[0].Select("City like '%" + (string)cityRow["City"] + "%'");
}
and using linq (considerably faster than the select statement)
Foreach (cityRow in dailyDS.Tables[0].Rows)
{
DataRow existingRow = masterDs.Tables[0].AsEnumerable().Where(row => row.Field<string>("City") == (string)cityRow["City"]).FirstOrDefault();
}
The entire aim of my program is to produce a list of cities where there is currently a company branch that are NOT in the master city list (ie, they have been spelt wrong etc.) I have written the program but just wanted to know:
What is the MOST efficient way to search for exact values in a dataset like this?
Upvotes: 0
Views: 1422
Reputation: 9772
If you want to have it really fast, you must load all cities of the world into a HashSet<string>
.
Then for each cityOfCompany check if the HashSet.Contains(cityOfCompany)
With just 23k strings to check this should run few seconds on a modern PC.
Upvotes: 2