user2922456
user2922456

Reputation: 391

How to get all links (<a href>) in URL

I get some URL and i need to search all the links in this URL and just show them, thats all.

I write its in java:

        PrintWriter writer=new PrintWriter("Web.txt");

        URL oracle = new URL("http://edition.cnn.com/");
        BufferedReader in = new BufferedReader(
        new InputStreamReader(oracle.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
        {

            writer.println(inputLine);
            System.out.println(inputLine);
        }
        in.close();

Now my question is how can I find only links in this huge file?

I thought about <a href" ... ... ..>but its not always right..

Thanks

Upvotes: 0

Views: 528

Answers (1)

everag
everag

Reputation: 7672

JSOUP is the way to go! It's a Java API on which you can parse HTML documents (either local or external ones) and navigate on it's DOM structure using a jQuery similiar syntax.

Your code to get all the links should look something like this:

Document doc = Jsoup.connect("http://edition.cnn.com").get(); // Parse this URL's HTML
Elements elements = doc.select("a"); // Search for all <a> elements

Then, to list every link and save it to your file:

for (Element element : elements) {
    writer.println(element.attr("href")); // Get the "href" attribute from the element
}

Upvotes: 1

Related Questions