Reputation: 1143
I have a list of different URLs (about 10) from which I need content. I have made a program with which I am getting the content of 1 URL but I am unable to do it with multiple URLs.
I've studied lots of tutorials on threads in Java but I'm unable to find an answer.
In my case, URLs are like www.example1.com
, www.example2.com
, www.example3.com
, www.example4.com
.
I want to make thread for each URL and run it at the same time.
public class HtmlParser {
public static int searchedPageCount = 0,
skippedPageCount = 0,
productCount = 0;
public static void main(String[] args) {
List<String> URLs = new LinkedList<String>();
long t1 = System.currentTimeMillis();
URLs.add("www.example.com");
int i = 0;
for (ListIterator iterator = URLs.listIterator(); i < URLs.size();) {
i++;
System.out.println("While loop");
List<String> nextLevelURLs = processURL(URLs.get(iterator
.nextIndex()));
for (String URL : nextLevelURLs) {
if (!URLs.contains(URL)) {
System.out.println(URL);
iterator.add(new String(URL));
}
}
System.out.println(URLs.size());
}
System.out.println("Total products found: " + productCount);
System.out.println("Total searched page: " + searchedPageCount);
System.out.println("Total skipped page: " + skippedPageCount);
long t2 = System.currentTimeMillis();
System.out.println("Total time taken: " + (t2 - t1) / 60000);
}
public static List<String> processURL(String URL) {
List<String> nextLevelURLs = new ArrayList<String>();
try {
searchedPageCount++;
// System.out.println("Current URL: " + URL);
Elements products = Jsoup.connect(URL).timeout(60000).get()
.select("div.product");
for (Element product : products) {
System.out.println(product.select(" a > h2").text());
System.out.println(product.select(" a > h3").text());
System.out.println(product.select(".product > a").attr("href"));
System.out
.println(product.select(".image a > img").attr("src"));
System.out.println(product.select(".price").text());
System.out.println();
productCount++;
}
// System.out.println("Total products found until now: " +
// productCount);
Elements links = Jsoup.connect(URL).timeout(60000).get()
.select("a[href]");
for (Element link : links) {
URL = link.attr("href");
if (URL.startsWith("http://www.example.com/")) {
// System.out.println("URLs added.");
nextLevelURLs.add(URL);
} else {
skippedPageCount++;
// System.out.println("URL skipped: " + URL);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return nextLevelURLs;
}
}
Upvotes: 0
Views: 325
Reputation: 2647
Unfortunately, there is no way to start two threads at the same time.
Let me explain better: first of all, the sequence thread1.Start(); and thread2.Start(); is executed with thread1 first and, after that, thread2. It means only that thread thread1 is scheduled before thread 2, not actually started. The two methods take fractions of second each one, so the fact that they are in sequence cannot be seen by a human observer.
More, Java threads are scheduled, ie. assigned to be eventually executed. Even if you have a multi-core CPU, you are not sure that 1) the threads run in parallel (other system processes may interfere) and 2) the threads both start just after the Start() method is called.
but you can run multiple threads in this way..
new Thread(thread1).start();
new Thread(thread2).start();
Upvotes: 3
Reputation: 835
First of all, the code you pasted looks like bad because it is orienting a simple process. You need to turn it into OO form and then extends the Thread (or Runnable) such like:
public class URLProcessor extends Thread {
private String url;
public URLProcessor(String url) {
this.url = url;
}
@Override
public void run() {
//your business logic to parse the site with "this.url" here
}
}
And then use the main entrance to load multiple ones by using:
public static void main(String[] args) {
List<String> allmyurls = null;//get multiple urls from somewhere
for (String url : allmyurls) {
URLProcessor p = new URLProcessor(url);
p.start();
}
}
Upvotes: 0
Reputation: 44834
basically create a class that implements Runnable
, put the code that deals with one url in this code. In your main class for each URL, construct a class with the information that is needs (E.g. URL) and then run run
Plenty of sites that teach how to do multi-threaded java
Upvotes: 0