Reputation: 119
I have a scenario where I need to update few items based on the data from another list. I have already gone through various questions over here but none helped.
Scenario
listA: Total Count around 88000
public class CDRs
{
public string cld { get; set; }
public string prefix2 { get; set; }
public string country { get; set; }
public string city { get; set; }
}
listB: Total Count : 3000.
public class RatesVM
{
public string prefix { get; set; }
public string Country { get; set; }
public string City { get; set; }
}
Now in listB there can be multiple matches of listA field that is cld
for eg. listA.cld = "8801123232"; Matched prefixes from ListB I get is
880 BGD Proper
8801 BGD Mobile
88011 BGD Dhaka Mobile
88017 BGD Dhaka Mobile
88018 BGD Dhaka Mobile
88019 BGD Dhaka Mobile
Now I want the closest match in this case it would be
88011 BGD Dhaka Mobile
Approach I am following right now.
foreach (var x in listA)
{
var tempObj = listB.FirstOrDefault(y => x.cld.StartsWith(y.prefix));
if (tempObj != null)
{
x.prefix2 = tempObj.prefix;
x.country = tempObj.Country;
x.city = tempObj.City;
}
else
{
x.prefix2 = "InBound";
x.country = "Unknown";
x.city = "Unknown";
}
}
It works fine but takes a lot of time. Around 2-3 minutes for this case.
There are few scenarios where ListA will have around 1 million records. I am worried it will take forever.
Many Thanks in advance
Upvotes: 0
Views: 112
Reputation: 2855
Your problem is hard, because you need the 'closest' solution rather than any solution at all. This forces you to iterate of every record in listB
, for each element in listA
.
Since you need an answer for every element in listA
you are forced to check every element in it.
You can however preprocess listB
by creating a tree structure. You create a node for each different first number of all strings in B. Then that node will be the parent of all records in listB
that start with that number. The nodes below that node will hold the second number in the string, and so on.
Went ahead and drew you a visual idea of what such a tree might look like:
Now if you search in listB
, you don't have to iterate over the entire listB
but can just traverse down the list which will increase your time per iteration from O(n)
to O(log n)
.
You would take the first letter in a record in listA
and compare it to the tree, and traverse in that branch (instantly eliminating a huge amount of records you otherwise would need to compare against, increasing your performance). Then compare the second letter, etc untill no more letters can be found in the tree. When you stop, you have found the longest matching record in listB
guaranteeing the 'closest' match, which FirstOrDefault(\x -> x.StartsWith())
doesn't do at all! (It finds the first match only, which is almost always just the first letter!).
You only have to create this tree once for all searches in listA
, and if there are changes in listB
you can easily update the tree as well.
If you're running this on a decent machine with more than one core, you can also parallelize this search. It increases the complexity of the program you're writing because you need to manage which thread searches which record in listA
though it will help out greatly with the performance and will greatly lower the amount of time needed.
Upvotes: 1
Reputation: 23975
I would suggest the below code. The key difference is using orderedListB
to ensure that you get the most specific match possible (i.e. start with the longest prefixes first), as well as a Dictionary
to cache results. *
Dictionary<string, RatesVM> cache = new Dictionary<string, RatesVM>();
var orderedListB = listB.OrderByDescending(z => z.prefix.Length).ToList();
foreach (var x in listA)
{
RatesVM cached;
cache.TryGetValue(x.cld, out cached);
var tempObj = cached ?? orderedListB.FirstOrDefault(z => x.cld.StartsWith(z.prefix));
if (tempObj != null)
{
if (cached == null)
{
cache.Add(x.cld, tempObj);
}
x.prefix2 = tempObj.prefix;
x.country = tempObj.Country;
x.city = tempObj.City;
}
else
{
x.prefix2 = "InBound";
x.country = "Unknown";
x.city = "Unknown";
}
}
You may also want to consider using Parallel.ForEach
rather than just foreach.
Upvotes: 1