user8721364
user8721364

Reputation:

How do I avoid two (or more) threads that work on a table at the same time to not work on same row?

I am trying to make a C# WinForms application that fetches data from a url that is saved in a table named "Links". And each link has a "Last Checked" and "Next Check" datetime and there is "interval" which decides "next check" based on last check.

Right now, what I am doing is fetching ID with a query BEFORE doing the webscraping, and after that I turn Last Checked into DateTime.Now and Next Check into null untill all is completed. Which both then gets updated, after web scraping is done.

Problem with this is if there is any "abort" with an ongoing process, lastcheck will be a date, but nextcheck will be null.

So I need a better way for two processes to not work on same table's same row. But not sure how.

Upvotes: 2

Views: 321

Answers (2)

mikelegg
mikelegg

Reputation: 1327

First thing, the separation of controller/workers might be a better pattern as mentioned in other answer. This will work better if the number of threads gets large and te number of links to check is large.

But if your problem is this:

But problem with it is, if for any reason that scraping gets aborted/finishes halfway/doesn't work properly, LastCheck becomes DateTime.Now but NextCheck is left NULL, and previous LastCheck/NextCheck values are gone, and LastCheck/NextCheck values are updated for a link that is not actually checked

You just need to handle errors better.

The failure will result in exception. Catch the exception and handle it by resetting the state in the database. For example:

void DoScraping(.....)
{
    try
    {
        // ....
    }
    catch (Exception err)
    {
        // oh dear, it went wrong, reset lastcheck/nextcheck
    }
}

What you reset last/nextcheck to depends on you. You could reset them to what they where at the start if when you determine 'the next thing to do' you also get the values of last/nextcheck and store in variables. Then in the event of failure just set to what they were before.

Upvotes: 0

jurez
jurez

Reputation: 4667

For a multithreaded solution, the standard engineering approach is to use a pool of workers and a pool of work.

This is just a conceptual sketch - you should adapt it to your circumstances:

  • A worker (i.e. a thread) looks at the pool of work. If there is some work available, it marks it as in_progress. This has to be done so that no two threads can take the same work. For example, you could use a lock in C# to do the query in a database, and to mark a row before returning it.
  • You need to have a way of un-marking it after the thread finishes. Successful or not, in_progress must be re-set. Typically, you could use a finally block so that you don't miss it in the event of any exception.
  • If there is no work available, the thread goes to sleep.
  • Whenever a new work arrives (i.e. INSERT, or a nextcheck is due), one of sleeping threads is awakened.
  • When your program starts, it should clear any in_progress flags in the event of a previous crash.
  • You should take advantage of DBMS transactions so that any changes a worker makes after completing its work are atomic - i.e. other threads percieve them as they had happened all at once.

By changing the size of worker pool, you can set the maximum number of simultaneously active workers.

Upvotes: 1

Related Questions