Reputation: 34735
I want to access forms on HTMl pages throught Java Programming Language without involving real browser in between.
At present I am doing it through HTML UNIT but it takes a bit more time to load a page. When it comes to accessing millions of page, then this extra bit time matters most.
Is there any other methods for doing this?
Upvotes: 0
Views: 2556
Reputation: 60498
I've used something similar called httpunit before, but I have no idea how it compares performance wise.
If you have millions of pages to process, I would recommend throwing some more threads at it. Just a guess, but I think that if you scale this up to multiple threads, you'll run out of bandwidth before you run out of CPU power (in which case it won't matter how much faster it could be)
Upvotes: 2
Reputation: 9519
Most of the interaction in browser comes down to an HTTP GET or an HTTP POST. You need to figure out exactly the operation you need, and then you can construct the URL and/or form data. Then you can use something like this:
try {
//Construct data
String data = URLEncoder.encode("key1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8"); data += "&" + URLEncoder.encode("key2", "UTF-8") + "=" + URLEncoder.encode("value2", "UTF-8");
// Send data
URL url = new URL("http://hostname:80/cgi");
URLConnection conn = url.openConnection(); conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data);
wr.flush();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line; while ((line = rd.readLine()) != null) {
// Process line... }
wr.close();
rd.close();
} catch (Exception e) { }
Upvotes: 0
Reputation: 11438
Accessing a web page using a browser, even HtmlUnit, is going to be slow. A better method is to test the layer just below the web interface, so that you don't need to access millions of pages -- instead you test enough to make sure that the web interface is using the lower layer correctly.
Upvotes: 0