Reputation: 1101
The below code i have helps me get the source code from the provided url without any errors. But what i am looking for is to format the source code i receive.
My manual task earlier was to go to this website http://www.freeformatter.com/html-formatter.html paste my source code and then format it by selecting 3 space per indent option. How do i get my java code to do the same formatting for me ?
The reason i want it formatted is because i have another script which reads it line by line and saves data which is required and ignores the rest.
private static String getUrlSource(String url) throws IOException {
URL x= new URL(url);
URLConnection yc = x.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
{ a.append(inputLine); a.append("\n");
}
in.close();
return a.toString();
}
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("Hello");
url="http://www.bctransit.com/regions/cfv/schedules/schedule.cfm?p=day.text&route=1%3A0&day=1&";
try {
String value= getUrlSource(url);
System.out.println(value);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Upvotes: 0
Views: 298
Reputation: 32497
If you are scraping a web page, I suggest using a real HTML parser instead. Your method is bound to fail sooner or later.
I would recommend having a look at jsoup. While I have never used it, I have had great results with its Python counterpart, Beautifulsoup.
Using a library such as jsoup will get you a nice object model to traverse instead of relying on string manipulation.
As a bonus, jsoup will actually format the HTML string for you, should you want that anyway.
Upvotes: 2