Reputation:
I am trying to make a C# WinForms application that fetches data from a url that is saved in a table named "Links". And each link has a "Last Checked" and "Next Check" datetime and there is "interval" which decides "next check" based on last check.
Right now, what I am doing is fetching ID with a query BEFORE doing the webscraping, and after that I turn Last Checked into DateTime.Now and Next Check into null untill all is completed. Which both then gets updated, after web scraping is done.
Problem with this is if there is any "abort" with an ongoing process, lastcheck will be a date, but nextcheck will be null.
So I need a better way for two processes to not work on same table's same row. But not sure how.
Upvotes: 2
Views: 321
Reputation: 1327
First thing, the separation of controller/workers might be a better pattern as mentioned in other answer. This will work better if the number of threads gets large and te number of links to check is large.
But if your problem is this:
But problem with it is, if for any reason that scraping gets aborted/finishes halfway/doesn't work properly, LastCheck becomes DateTime.Now but NextCheck is left NULL, and previous LastCheck/NextCheck values are gone, and LastCheck/NextCheck values are updated for a link that is not actually checked
You just need to handle errors better.
The failure will result in exception. Catch the exception and handle it by resetting the state in the database. For example:
void DoScraping(.....)
{
try
{
// ....
}
catch (Exception err)
{
// oh dear, it went wrong, reset lastcheck/nextcheck
}
}
What you reset last/nextcheck to depends on you. You could reset them to what they where at the start if when you determine 'the next thing to do' you also get the values of last/nextcheck and store in variables. Then in the event of failure just set to what they were before.
Upvotes: 0
Reputation: 4667
For a multithreaded solution, the standard engineering approach is to use a pool of workers and a pool of work.
This is just a conceptual sketch - you should adapt it to your circumstances:
in_progress
. This has to be done so that no two threads can take the same work. For example, you could use a lock
in C# to do the query in a database, and to mark a row before returning it.in_progress
must be re-set. Typically, you could use a finally
block so that you don't miss it in the event of any exception.INSERT
, or a nextcheck
is due), one of sleeping threads is awakened.in_progress
flags in the event of a previous crash.By changing the size of worker pool, you can set the maximum number of simultaneously active workers.
Upvotes: 1