Reputation:
I have HTML pages as String in Java and I need to extract the JavaScript links from it. Is there any good and easy to use library that I can use? I looked up Cobra and Neko, but I don't think (maybe I'm wrong) that they have what I need, such as getting tag specific content.
Upvotes: 0
Views: 899
Reputation: 42849
Take a look at JSoup. It is an HTML parser that has a selector-DSL (Domain Specific Language) for finding elements of the dom.
For example, to find all a
tags with an href
, you would do this:
Document doc = Jsoup.connect("http://www.google.com/").get();
Elements hrefAnchors = doc.select("a[href]");
If you already have the html downloaded as a String
, you can use the parse(String)
method:
String html = "<p>Welcome to <a href='http://www.google.com/'>Google</a>.</p>";
Document doc = Jsoup.parse(html);
Upvotes: 1