Reputation: 141
I'm new to java programming. I want only the web contents of the page. but the program I got gives me the html
tags with the content which i don't want.
Can anyone help me with this?
Thank you.
My code looks like this:
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Upvotes: 0
Views: 88
Reputation: 388
If you really want only a small portion of the web page, you'll have to parse the HTML page you receive. There is no other way around. When you use your InputStreamReader to read the content of your page, you'll get the same thing your browser gets.
The only difference between the browser and your code is that the browser interprets the content.
You'll need to parse the HTML (XML) content you got to find the correct text.
Here is a nice tutorial you can follow to use the built-in Java XML parser: https://www.tutorialspoint.com/java_xml/java_dom_parser.htm
Upvotes: 1