Reputation: 400
I have a string as html source code. I want to get only the links from that string and put these links into an ArrayList. As you know, I want to get some strings between <a href="THE LINK I WANT">
But I want to do this without using any external libraries. How can I do it with simple algorithm using String classes and loops? Thank you!
Upvotes: 1
Views: 4036
Reputation: 400
I've found the answer!!!!!
public ArrayList<String> getLinks() {
String link = "";
for(int i = 0; i<url.length()-6; i++) {
if(url.charAt(i) == 'h' && url.charAt(i+1) == 'r') {
for(int k = i; k<url.length();k++ ){
if(url.charAt(k) == '>'){
link = url.substring(i+6,k-1);
links.add(link);
// Break the loop
k = url.length();
}
}
}
}
return links;
Upvotes: 1
Reputation: 5253
Java Regex API
is not a proper tool to achieve your goal. Use efficient, secure and well tested high-level tools mentioned in the other answers.
If your question concerns rather Regex API than a real-life problem (learning purposes for example) - you can do it with the following code:
String html = "foo <a href='link1'>bar</a> baz <a href='link2'>qux</a> foo";
Pattern p = Pattern.compile("<a href='(.*?)'>");
Matcher m = p.matcher(html);
while(m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And the output is:
<a href='link1'>
link1
<a href='link2'>
link2
Please note that lazy/reluctant qualifier *? must be used in order to reduce the grouping to the single tag. Group 0 is the entire match, group 1 is the next group match (next pair of parenthesis).
A NOTE to Consider :
Using regular expressions to pull values from HTML is always a mistake. HTML syntax is a lot more complex that it may first appear and it's very easy for a page to catch out even a very complex regular expression.
Use an HTML Parser instead. See also What are the pros and cons of the leading Java HTML parsers?
Upvotes: 5