Manish singh
Manish singh

Reputation: 108

How to use HTML response to extract data in Java?

So, when I am sending an HTTP request using Java language, am getting the response in the form of HTML code. For example, sending request: http://www.google.com/search?q=what%20is%20mango

getting the response in the form of HTML code of this page: https://www.google.co.in/search?q=what+is+mango&rlz=1C1CHBF_enIN743IN743&oq=what+is+mango&aqs=chrome..69i57j0l5.4095j0j7&sourceid=chrome&ie=UTF-8

So, from this response page, I again want to send the request to Wikipedia page (listed in the response page) and then I want to copy the content about mango from the Wikipedia page and write it to a file on my system

the code from which I am sending the Google search request:

package api_test;

import java.io.*;
import java.net.*;
import java.util.*;

public class HttpURLConnectionExample {

    private final String USER_AGENT= "Mozilla/5.0";

    public static void main(String[] args) throws Exception {

        HttpURLConnectionExample http= new HttpURLConnectionExample();

        System.out.println("testing 1- send http get request");
        http.sendGet();

    }

    private void sendGet() throws Exception{

        Scanner s= new Scanner(System.in);
        System.out.println("enter the URL");
        String url = s.nextLine();

        URL obj = new URL("http://"+url);
        HttpURLConnection con = (HttpURLConnection) obj.openConnection();

        // optional default is GET
        con.setRequestMethod("GET");

        //add request header
        con.setRequestProperty("User-Agent", USER_AGENT);

        int responseCode = con.getResponseCode();
        System.out.println("\nSending 'GET' request to URL : " + url);
        System.out.println("Response Code : " + responseCode);

        BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();

        //print result
        System.out.println(response.toString());
    }

}

Upvotes: 1

Views: 6421

Answers (1)

Jairton Junior
Jairton Junior

Reputation: 742

I think what you need is a HTML Parser, like jsoup.

You could do something like

Document doc = Jsoup.connect("http://www.google.com/search?q=what%20is%20mango").get();
Element result = doc.select("#search h3.r a").first();
String link = result.attr("data-href");

I'm not sure if Google's layout changes a lot, but right now the CSS selector "#search h3.r a" is working.

Upvotes: 2

Related Questions