netha
netha

Reputation: 411

get links in a web site

how can i get links in a web page without loading it? (basically what i want is this. a user enters a URL and i want to load all the available links inside that URL.) can you please tell me a way to achieve this

Upvotes: 3

Views: 2726

Answers (5)

Jigar Joshi
Jigar Joshi

Reputation: 240908

Here is example Java code, specifically:

import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class Main {
  public static void main(String args[]) throws Exception {
    URL url = new URL(args[0]);
    Reader reader = new InputStreamReader((InputStream) url.getContent());
    System.out.println("<HTML><HEAD><TITLE>Links for " + args[0] + "</TITLE>");
    System.out.println("<BASE HREF=\"" + args[0] + "\"></HEAD>");
    System.out.println("<BODY>");
    new ParserDelegator().parse(reader, new LinkPage(), false);
    System.out.println("</BODY></HTML>");
  }
}

class LinkPage extends HTMLEditorKit.ParserCallback {

  public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
    if (t == HTML.Tag.A) {
      System.out.println("<A HREF=\"" + a.getAttribute(HTML.Attribute.HREF) + "\">"
          + a.getAttribute(HTML.Attribute.HREF) + "</A><BR>");
    }
  }

}

Upvotes: 2

Krish Nakum R
Krish Nakum R

Reputation: 545

public void extract_link(String site)
{
    try {
        List<String> links = extractLinks(site);
        for (String link : links) {
            System.out.println(link);
        }

    } catch (Exception e) {
        System.out.println(e);
    }
}

This is a simple function to view all links in a page. If you want to view link in the inner links , just call it recursively(but make sure you give a limit according to your need).

Upvotes: 0

Spilarix
Spilarix

Reputation: 1468

Just open an URLConnection, gets the page and parse it.

Upvotes: 0

Colin Hebert
Colin Hebert

Reputation: 93177

If you want the content of a page you'll have to load it. But what you can do is loading it in memory and parse it to get all the <a> tags and their content.

You'll be able to parse this XML with tools like JDom or Sax if you're working with java (as your tag says) or with simple DOM tools with javascript.


Resources :

On the same topic :

Upvotes: 0

Pointy
Pointy

Reputation: 413737

You'll have to load the page on your server and then find the links, preferably by loading up the document in an HTML/XML parser and traversing that DOM. The server could then send the links back to the client.

You can't do it on the client because the browser won't let your Javascript code look at the contents of the page from a different domain.

Upvotes: 0

Related Questions