dev_marshell08
dev_marshell08

Reputation: 1101

How to format webpage source code in java?

The below code i have helps me get the source code from the provided url without any errors. But what i am looking for is to format the source code i receive.

My manual task earlier was to go to this website http://www.freeformatter.com/html-formatter.html paste my source code and then format it by selecting 3 space per indent option. How do i get my java code to do the same formatting for me ?

The reason i want it formatted is because i have another script which reads it line by line and saves data which is required and ignores the rest.

 private static String getUrlSource(String url) throws IOException {
     URL x= new URL(url);
     URLConnection yc = x.openConnection();
     BufferedReader in = new BufferedReader(new InputStreamReader(
             yc.getInputStream(), "UTF-8"));
     String inputLine;
     StringBuilder a = new StringBuilder();
     while ((inputLine = in.readLine()) != null)
     { a.append(inputLine); a.append("\n");
     }
     in.close();

     return a.toString();
 }

public static void main(String[] args) {
    // TODO Auto-generated method stub
  System.out.println("Hello");

   url="http://www.bctransit.com/regions/cfv/schedules/schedule.cfm?p=day.text&route=1%3A0&day=1&";

  try {
    String value= getUrlSource(url);
    System.out.println(value);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Upvotes: 0

Views: 298

Answers (1)

Krumelur
Krumelur

Reputation: 32497

If you are scraping a web page, I suggest using a real HTML parser instead. Your method is bound to fail sooner or later.

I would recommend having a look at jsoup. While I have never used it, I have had great results with its Python counterpart, Beautifulsoup.

Using a library such as jsoup will get you a nice object model to traverse instead of relying on string manipulation.

As a bonus, jsoup will actually format the HTML string for you, should you want that anyway.

Upvotes: 2

Related Questions