BlueRay101
BlueRay101

Reputation: 1487

C# Locks - Is it better to lock before a loop or inside it?

I'm currently making a Web Crawler in C#, and I have a method that receives HTML strings, extracts links from them and inserts the links into the list of all the links captured.

Since it is multi-threaded, I have used locks to prevent the list of all the strings to be accessed from a few different threads at the same time.

Which is better to do with locks?

This:

void ProcessHTML(string HTML)
{
    List<string> Links = GetLinks(HTML);
    for (int i = 0; i < Links.Count; i++)
    {
        lock (WebsitesHash)
        {
             lock (AllLinks)
             {
                  if (!WebsitesHash.ContainsKey(Links[i]))
                  {
                       WebsitesHash[Links[i]] = true;
                       AllLinks.Add(Links[i]);                    
                  }
             }
        }
    }
}

Or this:

void ProcessHTML(string HTML)
{
    List<string> Links = GetLinks(HTML);
    lock (WebsitesHash)
    {
        lock (AllLinks)
        {
             for (int i = 0; i < Links.Count; i++)
             {
                  if (!WebsitesHash.ContainsKey(Links[i]))
                  {
                       WebsitesHash[Links[i]] = true;
                       AllLinks.Add(Links[i]);
                  }
             }
        }
    }
}

Which is generally considered better to do - lock in each iteration, or lock all the iterations?

Other code that might be relevant:

void StartCrawl(string Seed)
{
    AllLinks.Capacity = 1000 * 1000 * 10;
    StreamWriter Log = new StreamWriter(File.Open("Websites.txt", FileMode.Append));
    string HTML = GetHTML(Seed);
    ProcessHTML(HTML);
    for (int i = 0; i < AllLinks.Count; i++)
    {
        if (!Work)
        {
             Log.Close();
             WebsitesHash = new Dictionary<string, bool>();
             break;
        }
        Log.WriteLine(AllLinks[i]);
        websText.Text = AllLinks.Count + "";
        try { HTML = GetHTML(AllLinks[i]); }
        catch { continue; }
        Thread Parser = new Thread(() => ProcessHTML(HTML));
        Parser.Start();
    }
}

Upvotes: 4

Views: 2704

Answers (2)

Roman Bezrabotny
Roman Bezrabotny

Reputation: 186

Let AllLinks is global storage of links:

public List<string> AllLinks = new List<string>();

Use List.BinarySearch method somewhere in code for adding new link:

// "link" contain string of html link
lock(AllLinks)
{
    int index = AllLinks.BinarySearch(link);
    if( index < 0 )
    { // link is not in AllLinks
        AllLinks.Add(~index, link);
    }
    else
    { // link exist, "index" contain its position in list
        // ...
    }
}

I think that WebsitesHash object is not necessary.

UPD Additional advantage of using BinarySearch is the sorted state of AllLinks.

Upvotes: 1

Henk Holterman
Henk Holterman

Reputation: 273524

In this case, it won't matter a whole lot.

The links are retreived outside the lock so the only action is adding a handful strings to a list. That is very small so the question is moot.

If the amount of work was larger, locking inside the loop would have been preferable.

And although locks are cheap, you can optimize a little by locking just once. You can use a private object lockObject = new object(); to be clearer about the protocol.

Upvotes: 2

Related Questions