Peck3277
Peck3277

Reputation: 1423

Confusion with threads, beginner

I'm making a simple program to scrape content from several webpages. I want to improve the speed of my program so I want to use threads. I want to be able to control the amount of threads with some integer(down the line I want users to be able to define this).

This is the code I want to create threads for:

public void runLocales(String langLocale){
    ParseXML parser = new ParseXML(langLocale);
    int statusCode = parser.getSitemapStatus();
    if (statusCode > 0){
        for (String page : parser.getUrls()){
            urlList.append(page+"\n");
        }
    }else {
        urlList.append("Connection timed out");
    }
}

And the parseXML class:

public class ParseXML {
private String sitemapPath;
private String sitemapName = "sitemap.xml";
private String sitemapDomain = "somesite";
Connection.Response response = null;
boolean success = false;

ParseXML(String langLocale){
    sitemapPath = sitemapDomain+"/"+langLocale+"/"+sitemapName;
    int i = 0;
    int retries = 3;

    while (i < retries){
        try {
            response = Jsoup.connect(sitemapPath)
                    .userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
                    .timeout(10000)
                    .execute();
            success = true;
            break;
        } catch (IOException e) {

        }
        i++;
    }
}

public int getSitemapStatus(){
    if(success){
        int statusCode = response.statusCode();
        return statusCode;
    }else {
        return 0;
    }
}

public ArrayList<String> getUrls(){
    ArrayList<String> urls = new ArrayList<String>();
    try {
        Document doc = response.parse();

        Elements element = doc.select("loc");
        for (Element page : element){
            urls.add(page.text());
        }           
        return urls;
    } catch (IOException e) {
        System.out.println(e);
        return null;
    }
}   
}

I've been reading up about threads for a few days now and i can't figure out how to implement threading in my case? Can someone offer some insight please?

Upvotes: 0

Views: 181

Answers (4)

Icarus
Icarus

Reputation: 63966

Something like this should do:

new Thread(
        new Runnable() {
            public void run() {
                try {
                   runLocales(langLocale);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                System.out.println(
                    "child thread  " + new Date(System.currentTimeMillis()));
            }
        }).start();

Obviously, you still need to add the code to control how many Threads you want to create, etc., and decide what you want to do if your threshold is reached.

Upvotes: 1

Bhavik Ambani
Bhavik Ambani

Reputation: 6657

You can use the ThreadGroup for controlling the threads you want to maintain. Or you can also implement the ThreadPool mechanism for controlling threads.

You can help for using thread group class here.

And for ThreadPool implementation sample here.

Hope this will help you.

Enjoy !!!

Upvotes: 1

Miquel
Miquel

Reputation: 15675

Excuseme if I'm answering the obvious and your problem is different but, it looks like what you would like is to define

public class Runner extends Runnable{

    private final String langLocale;

    public Runner(String langLocale){
        this.langLocale = langeLocale;
    }

    public void run(){ //Instead of public void runLocales(String langLocale)
        //Do your thing here
    }
 }

And then create and start new threads using new Thread(new Runner("smth")).start();

Only you probably want to keep track of the thread to join it, so you don't have too many threads at a time. And when you have that problem, consider using a ThreadPool where you hand in the Runnables directly.

And one last thing, when crawling, be a good citizen! Respect the recommendations, use the robots.txt file, don't open more than a couple of threads to the same server, etc...

Have fun!

Upvotes: 1

Related Questions