Jaffer Wilson
Jaffer Wilson

Reputation: 7273

How to give file as input and work in multiple threads?

I have this code to find out how to get the status code from a URL:

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;

/**
 * @author Crunchify.com
 * 
 */

 class j {
    public static void main(String args[]) throws Exception {

        String[] hostList = { "http://example.com", "http://example2.com","http://example3.com" };

        for (int i = 0; i < hostList.length; i++) {

            String url = hostList[i];
            String status = getStatus(url);

            System.out.println(url + "\t\tStatus:" + status);
        }
    }

    public static String getStatus(String url) throws IOException {

        String result = "";
        try {
            URL siteURL = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) siteURL
                    .openConnection();
            connection.setRequestMethod("HEAD");
            connection.connect();

            int code = connection.getResponseCode();

                result = Integer.toString(code);

        } catch (Exception e) {
            result = "->Red<-";
        }
        return result;
    }
}

I have checked it for small input it works fine. But I have millions of domains which I need to scan. I have a file containing it.

  1. I want to know how I can give file as an input to this code.
  2. I want the code to work in Multiple Threads. Say Thread count should be more than 20000, so that my output will be faster.
  3. How I can write the out to another file?

Kindly help me. If possible I would like to know which the Bandwidth Savvy method to do the same job. I want to make the code faster anyways. how I can do these thing with the code I have? Java Version:

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Upvotes: 1

Views: 116

Answers (4)

Tim
Tim

Reputation: 4274

This does what you want:

Input list file (c://lines.txt)

http://www.adam-bien.com/
http://stackoverflow.com/
http://www.dfgdfgdfgdfgdfgertwsgdfhdfhsru.de
http://www.google.de

The Thread:

import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.Callable;

public class StatusThread implements Callable<String> {

    String url;

    public StatusThread(String url) {
        this.url = url;
    }

    @Override
    public String call() throws Exception {

        String result = "";
        try {
            URL siteURL = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) siteURL.openConnection();
            connection.setRequestMethod("HEAD");
            connection.connect();

            int code = connection.getResponseCode();

            result = Integer.toString(code);

        } catch (Exception e) {
            result = "->Red<-";
        }
        return url + "|" + result;
    }
}

And the main program:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.stream.Stream;

public class CallableExample {
    public static void main(String[] args) throws IOException {

        // Number of threads
        int numberOfThreads = 10;

        // Input file
        String sourceFileName = "c://lines.txt"; // Replace by your own
        String targetFileName = "c://output.txt"; // Replace by your own

        // Read input file into List    
        ArrayList<String> urls = new ArrayList<>();
        try (Stream<String> stream = Files.lines(Paths.get(sourceFileName ))) {
            stream.forEach((string) -> {
                urls.add(string);
            });

        } catch (IOException e) {
            e.printStackTrace();
        }

        // Create thread pool
        ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
        List<Future<String>> resultList = new ArrayList<>();

        // Launch threads
        for(String url : urls) {
            StatusThread statusGetter = new StatusThread(url);
            Future<String> result = executor.submit(statusGetter);
            resultList.add(result);
        }

        // Use results
        FileWriter writer;
        writer = new FileWriter(targetFileName);
        for (Future<String> future : resultList) {
            try {
                String oneResult = future.get().split("\\|")[0] + " -> " + future.get().split("\\|")[1];

                // Print the results to the console
                System.out.println(oneResult);

                // Write the result to a file
                writer.write(oneResult + System.lineSeparator());

            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        writer.close();


        // Shut down the executor service
        executor.shutdown();
    }
}

Don't forget to:

  • Create your input file and point to it (c://lines.txt)
  • Change the number of threads to get the best result

Upvotes: 1

blackwizard
blackwizard

Reputation: 2044

I agree with Thread pool approach exposed here. Multi-threading consists in exploiting the time the others threads spend to wait (I guess int his case: the distant site response). It does not multiply processing power. Then about 10 threads seem reasonable (more depending on hardware).

An important point that seem to have been neglected in answer I read is that OP talk about millions of domains. Then I would discourage loading whole file in memory in a list iterated over afterwards. I would rather merge all in a single loop (file reading), instead of 3 (read, ping, write).

stream.forEach((url) -> {
     StatusThread statusGetter = new StatusThread(url, outputWriter);
     Future<String> result = executor.submit(statusGetter);
});

outputWriter would be a type with a synchronized method to write into an output stream.

Upvotes: 1

davidxxx
davidxxx

Reputation: 131346

You can use the ExecutorService and set the thread number to use.
The ExecutorService instance will handle for your the threads management.
You just need to provide it the tasks to execute and invoking all tasks executions
.

When all the task are performed you can get the result.
In the call() method of The Callable implementation we return a String with a separator to indicate the url and the response code of the request.
For example : http://example3.com||301, http://example.com||200, etc...

I have not written the code to read a file and store in another file the result of the tasks. You should not have great difficulty to implement it.

Here is the main class :

import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class Main {

    public static void main(String[] args) throws InterruptedException {

        String[] hostList = { "http://example.com", "http://example2.com", "http://example3.com" };

        int nbThreadToUse = Runtime.getRuntime().availableProcessors() - 1;
        ExecutorService executorService = Executors.newFixedThreadPool(nbThreadToUse);
        Set<Callable<String>> callables = new HashSet<Callable<String>>();
        for (String host : hostList) {
            callables.add(new UrlCall(host));
        }

        List<Future<String>> futures = executorService.invokeAll(callables);

        for (Future<String> future : futures) {
            try {
                String result = future.get();
                String[] keyValueToken = result.split("\\|\\|");
                String url = keyValueToken[0];
                String response = keyValueToken[1];
                System.out.println("url=" + url + ", response=" + response);

            } catch (ExecutionException e) {
                e.printStackTrace();
            }
        }

        executorService.shutdown();
    }

}

Here is UrlCall, the Callable implementation to perform a call to the url. UrlCall takes in its constructor the url to test.

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.Callable;

public class UrlCall implements Callable<String> {

    private String url;

    public UrlCall(String url) {
        this.url = url;
    }

    @Override
    public String call() throws Exception {
        return getStatus(url);
    }

    private String getStatus(String url) throws IOException {

        try {
            URL siteURL = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) siteURL.openConnection();
            connection.setRequestMethod("HEAD");
            connection.connect();

            int code = connection.getResponseCode();
            return url + "||" + code;

        } catch (Exception e) {
            //FIXME to log of course
            return url + "||exception";
        }

    }

}

Upvotes: 1

Ian
Ian

Reputation: 466

You will have issues sharing a file across threads. Much better to read the file and then spawn a thread to process each record in the file.

Creating a thread is none trivial resource wise so a thread pool would be useful so threads can be reused.

Do you want all threads to write to a single file?

I would do that using a shared list between the threads and the writer. others may have a better idea.

How to do all this depends on Java version.

Upvotes: 1

Related Questions