Reputation: 7273
I have this code to find out how to get the status code from a URL:
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
/**
* @author Crunchify.com
*
*/
class j {
public static void main(String args[]) throws Exception {
String[] hostList = { "http://example.com", "http://example2.com","http://example3.com" };
for (int i = 0; i < hostList.length; i++) {
String url = hostList[i];
String status = getStatus(url);
System.out.println(url + "\t\tStatus:" + status);
}
}
public static String getStatus(String url) throws IOException {
String result = "";
try {
URL siteURL = new URL(url);
HttpURLConnection connection = (HttpURLConnection) siteURL
.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
int code = connection.getResponseCode();
result = Integer.toString(code);
} catch (Exception e) {
result = "->Red<-";
}
return result;
}
}
I have checked it for small input it works fine. But I have millions of domains which I need to scan. I have a file containing it.
Kindly help me. If possible I would like to know which the Bandwidth Savvy method to do the same job. I want to make the code faster anyways. how I can do these thing with the code I have? Java Version:
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Upvotes: 1
Views: 116
Reputation: 4274
This does what you want:
Input list file (c://lines.txt)
http://www.adam-bien.com/
http://stackoverflow.com/
http://www.dfgdfgdfgdfgdfgertwsgdfhdfhsru.de
http://www.google.de
The Thread:
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.Callable;
public class StatusThread implements Callable<String> {
String url;
public StatusThread(String url) {
this.url = url;
}
@Override
public String call() throws Exception {
String result = "";
try {
URL siteURL = new URL(url);
HttpURLConnection connection = (HttpURLConnection) siteURL.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
int code = connection.getResponseCode();
result = Integer.toString(code);
} catch (Exception e) {
result = "->Red<-";
}
return url + "|" + result;
}
}
And the main program:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.stream.Stream;
public class CallableExample {
public static void main(String[] args) throws IOException {
// Number of threads
int numberOfThreads = 10;
// Input file
String sourceFileName = "c://lines.txt"; // Replace by your own
String targetFileName = "c://output.txt"; // Replace by your own
// Read input file into List
ArrayList<String> urls = new ArrayList<>();
try (Stream<String> stream = Files.lines(Paths.get(sourceFileName ))) {
stream.forEach((string) -> {
urls.add(string);
});
} catch (IOException e) {
e.printStackTrace();
}
// Create thread pool
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
List<Future<String>> resultList = new ArrayList<>();
// Launch threads
for(String url : urls) {
StatusThread statusGetter = new StatusThread(url);
Future<String> result = executor.submit(statusGetter);
resultList.add(result);
}
// Use results
FileWriter writer;
writer = new FileWriter(targetFileName);
for (Future<String> future : resultList) {
try {
String oneResult = future.get().split("\\|")[0] + " -> " + future.get().split("\\|")[1];
// Print the results to the console
System.out.println(oneResult);
// Write the result to a file
writer.write(oneResult + System.lineSeparator());
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
writer.close();
// Shut down the executor service
executor.shutdown();
}
}
Don't forget to:
Upvotes: 1
Reputation: 2044
I agree with Thread pool approach exposed here. Multi-threading consists in exploiting the time the others threads spend to wait (I guess int his case: the distant site response). It does not multiply processing power. Then about 10 threads seem reasonable (more depending on hardware).
An important point that seem to have been neglected in answer I read is that OP talk about millions of domains. Then I would discourage loading whole file in memory in a list iterated over afterwards. I would rather merge all in a single loop (file reading), instead of 3 (read, ping, write).
stream.forEach((url) -> {
StatusThread statusGetter = new StatusThread(url, outputWriter);
Future<String> result = executor.submit(statusGetter);
});
outputWriter
would be a type with a synchronized method to write into an output stream.
Upvotes: 1
Reputation: 131346
You can use the ExecutorService
and set the thread number to use.
The ExecutorService
instance will handle for your the threads management.
You just need to provide it the tasks to execute and invoking all tasks executions
.
When all the task are performed you can get the result.
In the call()
method of The Callable
implementation we return a String
with a separator to indicate the url and the response code of the request.
For example : http://example3.com||301
, http://example.com||200
, etc...
I have not written the code to read a file and store in another file the result of the tasks. You should not have great difficulty to implement it.
Here is the main class :
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Main {
public static void main(String[] args) throws InterruptedException {
String[] hostList = { "http://example.com", "http://example2.com", "http://example3.com" };
int nbThreadToUse = Runtime.getRuntime().availableProcessors() - 1;
ExecutorService executorService = Executors.newFixedThreadPool(nbThreadToUse);
Set<Callable<String>> callables = new HashSet<Callable<String>>();
for (String host : hostList) {
callables.add(new UrlCall(host));
}
List<Future<String>> futures = executorService.invokeAll(callables);
for (Future<String> future : futures) {
try {
String result = future.get();
String[] keyValueToken = result.split("\\|\\|");
String url = keyValueToken[0];
String response = keyValueToken[1];
System.out.println("url=" + url + ", response=" + response);
} catch (ExecutionException e) {
e.printStackTrace();
}
}
executorService.shutdown();
}
}
Here is UrlCall
, the Callable implementation to perform a call to the url.
UrlCall
takes in its constructor the url to test.
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.Callable;
public class UrlCall implements Callable<String> {
private String url;
public UrlCall(String url) {
this.url = url;
}
@Override
public String call() throws Exception {
return getStatus(url);
}
private String getStatus(String url) throws IOException {
try {
URL siteURL = new URL(url);
HttpURLConnection connection = (HttpURLConnection) siteURL.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
int code = connection.getResponseCode();
return url + "||" + code;
} catch (Exception e) {
//FIXME to log of course
return url + "||exception";
}
}
}
Upvotes: 1
Reputation: 466
You will have issues sharing a file across threads. Much better to read the file and then spawn a thread to process each record in the file.
Creating a thread is none trivial resource wise so a thread pool would be useful so threads can be reused.
Do you want all threads to write to a single file?
I would do that using a shared list between the threads and the writer. others may have a better idea.
How to do all this depends on Java version.
Upvotes: 1