Jsoup parsing html tag from page

Question

I am trying to parse pages (any page dynamic parser). code is

Elements title = doc.select("title");
Elements metades = doc.select("meta[name=description]");

As you can see i want to extract title tag.

It is working fine on approx every website for example hinddroid.com But it unable to parse Title from google.com and youtube.com I think it is due to no space between two tags. Most of big website not have space in html to save bandwidth. Please suggest me - i want to parse html from website.

Full code :

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.sql.*;
import java.util.regex.*;

import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class post_link extends HttpServlet 
{
@Override
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException, ServletException
{

response.setContentType("text/html");
PrintWriter out = response.getWriter();

try 
{
//out.println("");
String linktopro = "http://"+request.getParameter("link_topro");
//String linktopro = "http://hinddroid.com";
Document doc = Jsoup.connect(linktopro).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6").timeout(3000).get();
Elements png = doc.select("img[src]");
Elements title = doc.select("title:first-child");
//Elements title = doc.title();
Elements metades = doc.select("meta[name=description]");
Pattern p1 = Pattern.compile("http://.*|.com*?.(com)");

out.println("");

String outlink = ""+
""+
""+
"
<
>"+
""+
""+
""+title.html()+"

"+
""+metades.attr("content")+""+
""+
"";
out.println(outlink);
out.print("");




}
catch(Exception ex)
{
out.print(ex);
} 
finally 
{
out.close();
}

}


}

Jsoup parsing html tag from page

Answers (1)

Related Questions