Reputation: 427
Here is the idea:
It is sort of like how you can search through a Google Desktop Application instead of going on a browser?
I just need a general push towards the right direction on this. (maybe a certain method I should look for) I'm not very familiar with the Java API.
Upvotes: 0
Views: 4150
Reputation: 3080
You can use Java's standard HttpURLConnection to search the content. Then to parse the response all you need is Apache tika which is used to extract text from HTML pages.
Here is a simple example of using Url Connection :
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.net.URLEncoder;
public class SimpleHTTPRequest {
/**
* @param args
*/
public static void main(String[] args) {
HttpURLConnection connection = null;
DataOutputStream wr = null;
BufferedReader rd = null;
StringBuilder sb = null;
String line = null;
URL serverAddress = null;
try {
serverAddress = new URL("http://www.google.com/search?q=test");
//set up out communications stuff
connection = null;
//Set up the initial connection
connection = (HttpURLConnection)serverAddress.openConnection();
connection.setRequestMethod("GET");
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setUseCaches(false);
connection.setRequestProperty ( "Content-type","text/xml" );
connection.setAllowUserInteraction(false);
String strData = URLEncoder.encode("test","UTF-8");
connection.setRequestProperty ( "Content-length", "" + strData.length ());
connection.setReadTimeout(10000);
connection.connect();
//get the output stream writer and write the output to the server
//not needed in this example
wr = new DataOutputStream(connection.getOutputStream());
wr.writeBytes("q="+strData);
wr.flush();
//read the result from the server
rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
sb = new StringBuilder();
while ((line = rd.readLine()) != null)
{
sb.append(line + '\n');
}
System.out.println(sb.toString());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (ProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally
{
//close the connection, set all objects to null
connection.disconnect();
rd = null;
sb = null;
wr = null;
connection = null;
}
}
}
And here you find an example of extracting text using apache tika
Upvotes: 1
Reputation: 166
You have to use URL Class to connect with web.
For example
url1 = new URL(url);
InputStream input=url1.openStream();
BufferedInputStream bis=new BufferedInputStream(input);
dis=new DataInputStream(bis);
// byte[] buffer=new byte[1000];
String data="";
while(dis.available()!=0)
{
data+=dis.readLine();
}
jobj=new JSONObject(data);
Upvotes: 0
Reputation: 2036
You may use open source lib Apache Http components. This eases the job.
Upvotes: 0
Reputation: 7740
You have to learn about Java socket programming and how a web server works. Along with this, you use HttpURLConnection
class to establish the connection to the web server and you can download the content.
http://docs.oracle.com/javase/1.4.2/docs/api/java/net/HttpURLConnection.html
Upvotes: 0