Reputation: 108
So, when I am sending an HTTP request using Java language, am getting the response in the form of HTML code. For example, sending request: http://www.google.com/search?q=what%20is%20mango
getting the response in the form of HTML code of this page:
https://www.google.co.in/search?q=what+is+mango&rlz=1C1CHBF_enIN743IN743&oq=what+is+mango&aqs=chrome..69i57j0l5.4095j0j7&sourceid=chrome&ie=UTF-8
So, from this response page, I again want to send the request to Wikipedia page (listed in the response page) and then I want to copy the content about mango from the Wikipedia page and write it to a file on my system
the code from which I am sending the Google search request:
package api_test;
import java.io.*;
import java.net.*;
import java.util.*;
public class HttpURLConnectionExample {
private final String USER_AGENT= "Mozilla/5.0";
public static void main(String[] args) throws Exception {
HttpURLConnectionExample http= new HttpURLConnectionExample();
System.out.println("testing 1- send http get request");
http.sendGet();
}
private void sendGet() throws Exception{
Scanner s= new Scanner(System.in);
System.out.println("enter the URL");
String url = s.nextLine();
URL obj = new URL("http://"+url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
System.out.println(response.toString());
}
}
Upvotes: 1
Views: 6421
Reputation: 742
I think what you need is a HTML Parser, like jsoup.
You could do something like
Document doc = Jsoup.connect("http://www.google.com/search?q=what%20is%20mango").get();
Element result = doc.select("#search h3.r a").first();
String link = result.attr("data-href");
I'm not sure if Google's layout changes a lot, but right now the CSS selector "#search h3.r a" is working.
Upvotes: 2