Reputation: 1
I have been given a task to scan the contents of a website's source code, and use delimiters to extract all hyperlinks from the site and display them. After some looking around online this is what I have so far:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
public class HyperlinkMain {
public static void main(String[] args) {
try {
Scanner in = new Scanner (System.in);
String URL = in.next();
URL website = new URL(URL);
BufferedReader input = new BufferedReader(new InputStreamReader(website.openStream()));
String inputLine;
while ((inputLine = input.readLine()) != null) {
// Process each line.
System.out.println(inputLine);
}
in.close();
} catch (MalformedURLException me) {
System.out.println(me);
} catch (IOException ioe) {
System.out.println(ioe);
}
}
}
So my program can extract each line from the source code of a website and display it, but realistically I want it to extract each WORD as such from the source code rather than every line. I don't really know how it's done because I keep getting errors when I use input.read();
Upvotes: 0
Views: 1265
Reputation: 2148
There is lots of source code around to retrieve web pages. Look at the Pattern
class to see how to regex text for hyperlinks. You can treat your homework assignment as two separate problems by working on the hyperlink extraction separately from the web page downloads.
Upvotes: 1