Mayur
Mayur

Reputation: 1143

Start Multiple thread at the same time

I have a list of different URLs (about 10) from which I need content. I have made a program with which I am getting the content of 1 URL but I am unable to do it with multiple URLs.

I've studied lots of tutorials on threads in Java but I'm unable to find an answer.

In my case, URLs are like www.example1.com, www.example2.com, www.example3.com, www.example4.com.

I want to make thread for each URL and run it at the same time.

public class HtmlParser {
    public static int searchedPageCount = 0, 
                      skippedPageCount = 0,
          productCount = 0;

public static void main(String[] args) {

    List<String> URLs = new LinkedList<String>();

    long t1 = System.currentTimeMillis();

    URLs.add("www.example.com");

    int i = 0;
    for (ListIterator iterator = URLs.listIterator(); i < URLs.size();) {
        i++;
        System.out.println("While loop");
        List<String> nextLevelURLs = processURL(URLs.get(iterator
                .nextIndex()));
        for (String URL : nextLevelURLs) {
            if (!URLs.contains(URL)) {
                System.out.println(URL);
                iterator.add(new String(URL));
            }
        }
        System.out.println(URLs.size());
    }

    System.out.println("Total products found: " + productCount);
    System.out.println("Total searched page: " + searchedPageCount);
    System.out.println("Total skipped page: " + skippedPageCount);

    long t2 = System.currentTimeMillis();
    System.out.println("Total time taken: " + (t2 - t1) / 60000);
}

public static List<String> processURL(String URL) {

    List<String> nextLevelURLs = new ArrayList<String>();

    try {
        searchedPageCount++;
        // System.out.println("Current URL: " + URL);
        Elements products = Jsoup.connect(URL).timeout(60000).get()
                .select("div.product");

        for (Element product : products) {

            System.out.println(product.select(" a > h2").text());
            System.out.println(product.select(" a > h3").text());
            System.out.println(product.select(".product > a").attr("href"));
            System.out
                    .println(product.select(".image a > img").attr("src"));
            System.out.println(product.select(".price").text());
            System.out.println();

            productCount++;

        }

        // System.out.println("Total products found until now: " +
        // productCount);
        Elements links = Jsoup.connect(URL).timeout(60000).get()
                .select("a[href]");

        for (Element link : links) {
            URL = link.attr("href");
            if (URL.startsWith("http://www.example.com/")) {
                // System.out.println("URLs added.");
                nextLevelURLs.add(URL);
            } else {
                skippedPageCount++;
                // System.out.println("URL skipped: " + URL);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return nextLevelURLs;
}

}

Upvotes: 0

Views: 325

Answers (3)

Mitul Maheshwari
Mitul Maheshwari

Reputation: 2647

Unfortunately, there is no way to start two threads at the same time.

Let me explain better: first of all, the sequence thread1.Start(); and thread2.Start(); is executed with thread1 first and, after that, thread2. It means only that thread thread1 is scheduled before thread 2, not actually started. The two methods take fractions of second each one, so the fact that they are in sequence cannot be seen by a human observer.

More, Java threads are scheduled, ie. assigned to be eventually executed. Even if you have a multi-core CPU, you are not sure that 1) the threads run in parallel (other system processes may interfere) and 2) the threads both start just after the Start() method is called.

but you can run multiple threads in this way..

new Thread(thread1).start();
new Thread(thread2).start();

Upvotes: 3

LynxZh
LynxZh

Reputation: 835

First of all, the code you pasted looks like bad because it is orienting a simple process. You need to turn it into OO form and then extends the Thread (or Runnable) such like:

public class URLProcessor extends Thread {
    private String url;
    public URLProcessor(String url) {
        this.url = url;
    }

    @Override
    public void run() {
       //your business logic to parse the site with "this.url" here
    }
}

And then use the main entrance to load multiple ones by using:

public static void main(String[] args) {
    List<String> allmyurls = null;//get multiple urls from somewhere
    for (String url : allmyurls) {
        URLProcessor p = new URLProcessor(url);
        p.start();
    }
}

Upvotes: 0

Scary Wombat
Scary Wombat

Reputation: 44834

basically create a class that implements Runnable, put the code that deals with one url in this code. In your main class for each URL, construct a class with the information that is needs (E.g. URL) and then run run

Plenty of sites that teach how to do multi-threaded java

Upvotes: 0

Related Questions