Reputation: 5240
I'm using following code for retrieving data from internet but I get HTTP headers also which is useless for me.
URL url = new URL(webURL);
URLConnection conn = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
how can I get html data only not any headers or whatsoever.
regards
Upvotes: 1
Views: 813
Reputation: 94645
You are retrieving correct data using URLConnecton. However if you want to read/access a particular html tag you must have to use HTML parser. I suggest you to use jSoup.
Example:
org.jsoup.nodes.Document doc = org.jsoup.Jsoup.connect("http://your_url/").get();
org.jsoup.nodes.Element head=doc.head(); // <head> tag content
org.jsoup.nodes.Element body=doc.body(); // <body> tag content
System.out.println(doc.text()); // Only text inside the <html>
Upvotes: 1
Reputation: 613
You are meaning to translate html into text? If so, you can use org.htmlparser.*
. Take a loo at http://htmlparser.sourceforge.net/
Upvotes: 0
Reputation: 60414
Retrieving and parsing a document using TagSoup:
Parser p = new Parser();
SAX2DOM sax2dom = new SAX2DOM();
URL url = new URL("http://stackoverflow.com");
p.setContentHandler(sax2dom);
p.parse(new InputSource(new InputStreamReader(url.openStream())));
org.w3c.dom.Node doc = sax2dom.getDOM();
The TagSoup and SAX2DOM packages are:
import org.ccil.cowan.tagsoup.Parser;
import org.apache.xalan.xsltc.trax.SAX2DOM;
Writing the contents to System.out
:
TransformerFactory tFact = TransformerFactory.newInstance();
Transformer transformer = tFact.newTransformer();
Source source = new DOMSource(doc);
Result result = new StreamResult(System.out);
transformer.transform(source, result);
These all come from import javax.xml.transform.*
Upvotes: 1
Reputation: 2335
You can parse the complete data to search for the string and accept the data only between html tags
Upvotes: 0