sfarzoso
sfarzoso

Reputation: 1610

How to improve search performance

I have a list of Country objects that have this design:

public class Country
{
   public string Name { get; set; }
   public string ISO { get; set; }
}

I need to check if in a particular list a Country exists or not, so I did:

foreach (var date in dates)
{
    var newCountries = GetCountryByDate(date);

    foreach (var country in newCountries)
    {
        if (!countries.Exists(c => c.ISO == country.iso))
            countries.Add(new Country
            {
                Id = country.id,
                Name = country.name,
                ISO = country.iso,
                Link = country.id.ToString()
            });
    }
}

where newCountries contains the list of Country which need to be iterated, and countries is a List<Country> filtered by date, infact each date contains a different list of Countries.

The problem's that this mechanism is really slow

Upvotes: 0

Views: 82

Answers (1)

Fabio
Fabio

Reputation: 32455

You are checking for new countries in countries list by ISO property.

Instead of looping the list for every new country (which is O(n2)), generate HashSet<string> for ISO property which will turn search algorithm into O(n).

var existedISOCodes = countries.Select(country => country.ISO).ToHashSet();

foreach (var date in dates)
{
    var newCountries = GetCountryByDate(date);

    foreach (var country in newCountries)
    {
        if (existedISOCodes.Add(country.iso))
        {
            countries.Add(new Country
            {
                Id = country.id,
                Name = country.name,
                ISO = country.iso,
                Link = country.id.ToString()
            });
        }
    }
}

HashSet<T>.Add method will try to add given value to the set and return false if value already exists.

If you are a fan of LINQ:

var existedISOCodes = countries.Select(country => country.ISO).ToHashSet();
var newCountries = 
    dates.SelectMany(date => GetCountryByDate(date))
         .Where(country => existedISOCodes.Add(country.ISO))
         .Select(country => 
         {
             return new Country
             {
                 Id = country.id,
                 Name = country.name,
                 ISO = country.iso,
                 Link = country.id.ToString()
             };
         });

countries.AddRange(newCountries);

But I assume actual performance bottleneck is GetCountryByDate method.
If this method accessing some external resources (database, webservices) and you are not able to get countries for all date in one requests, you probably can turn GetCountryByDate into asynchronous function then you will be able to get countries for all dates almost simultaneously

var newCountryTasks = dates.Select(date => GetCountryByDateAsync(date));
await Task.WhenAll(newCountryTasks);

var newCountries = 
    newCountryTasks.SelectMany(task => task.Result)
                   .Where(country => existedISOCodes.Add(country.ISO))
                   .Select(country => 
                   {
                       return new Country
                       {
                           Id = country.id,
                           Name = country.name,
                           ISO = country.iso,
                           Link = country.id.ToString()
                       };
                   });

countries.AddRange(newCountries);

Upvotes: 3

Related Questions