Reputation: 741
Status: solved
I had to make a pastebin as I had to point out line numbers.
note: not using executorsService or thread pools. just to understand that what is wrong in starting and using threads this way. If I use 1 thread. the app works Perfect!
related links:
http://www.postgresql.org/docs/9.1/static/transaction-iso.html http://www.postgresql.org/docs/current/static/explicit-locking.html
main app,
http://pastebin.com/i9rVyari
logs
, http://pastebin.com/2c4pU1K8 , http://pastebin.com/2S3301gD
I am starting many threads (10) in a for loop with instantiating a runnable
class but it seems I am getting same result from db
(I am geting some string from db, then changing it) but with each thread, I get same string
(despite each thread changed it.) . using jdbc
for postgresql
what might be the usual issues ?
line 252
and line 223
the link is marked as processed. (true)
in db. other threads of crawler class
also do it. so when line 252
should get a link. it should be processed = false
. but I see all threads take same link.
when one of the threads crawled the link . it makes it processed = true. the others then should not crawl it. (get it) is its marked processed = true.
getNonProcessedLinkFromDB()
returns a non processed link
public String getNonProcessedLink(){ line 645
public boolean markLinkAsProcesed(String link){ line 705
getNonProcessedLinkFromDB
will see for processed = false links and give one out of them . limit 1
each thread has a starting interval gap of 20 secs.
within one thread. 1 or 2 seconds (estimate processing time for crawling)
line 98 keepS threads from grabbing the same url
if you see the result. one thread made it true. still others access it. waaaay after some time.
all thread are seperate. even one races
. the db makes the link true at the moment the first thread processes it
Upvotes: 0
Views: 650
Reputation: 741
Despite the comments and response by helpers in this post were also correct.
at the start of crawl() method body.
synchronized(Crawler.class){
url = getNonProcessedLinkFromDB();
new BasicDAO().markLinkAsProcesed(url);
}
and at the bottom of crawl() method body (when it has done processing):
crawl(nonProcessedLinkFromDB);
actually solved the issue.
It was the gap between marking a link processed true and fetching a new one and letting other threads get the same link while the current was working on it.
Synchonized block
helped further.
Thanks to helper. "Fuber" on IRC channels. Quakenet servers #java and Freenode servers ##javaee
and ALL who supported me!
Upvotes: 0
Reputation: 1967
This is a situation of not a concise question being asked. There is lots of code in there and you have no idea what is going on. You need to break it down so that you can understand where it is going wrong, then show us that bit.
Some things of potential conflict.
I did not read the logs because they are useless to me.
-edit for comment- Databases generally have transactions. Modifications you make in one transaction are not seen in other transactions until they are committed. Transaction can be rolled back. You'll need to look into fetching the row you just updated and see if the value has really changed. Do this in another transaction or on another connection.
The gap of 20 seconds looks like it is only when the process is started. Imagine a situation where Thread1 processes URL1 and Thread2 processes URL2. They both finish at about the same time. They both look for the next unprocessed URL (say URL3). They would both start processing this Url because they don't know another thread has started it. You need one process handing out the Url, possibly a queue is what you'd want to look at.
Logging might be improved if you knew which threads were working on which URLs. You also need a smaller sample size so that you can get your head around what is going on.
Upvotes: 2