Reputation: 6197
I want to match some links from a web content. I know I can use file_get_contents(url) to do this in php. How about in javascript? For regular expression, like
<a href="someurl/something" id="someid">contents</a>
How can I use js regular expression to match this (match only once, do not greedy). I try to use this
/^\<a href=\"someurl\/something\" id=\"someid\"\>(+?)\<\/a\>$/
but it doesn't work. Can someone help? Thanks!
Upvotes: 0
Views: 373
Reputation: 16524
DOM and jQuery suggestions are better but if you still want to use regex then try this:
/^<a href=".*?" id=".*?">(.*?)<\/a>$/
Upvotes: 0
Reputation: 1413
Try this~
try {
boolean foundMatch = subjectString.matches("(?im)<a[^>]*href=(\"[^\"]*\"|'[^']*'|[^\\s>]*)[^>]*>.*?</a>");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Match double quotation marks,single quotes and empty.
<a href="someurl/something" id="someid">contents</a>
<a href='someurl/something' id='someid'>contents</a>
<a href=someurl/something id=someid>contents</a>
Upvotes: 0
Reputation: 4821
You might as well create the elements with jQuery
var elements = $(html);
var links = elements.find('a');
links.each(function(i, link){
//Do the regexp matching in here if you wish to search for specific urls only
});
In bigger documents, using the DOM is way quicker than regexping the whole thing as text.
Upvotes: 0
Reputation: 526533
I'd highly recommend using a library like jQuery to get the element, and then get the contents via a .text()
call. It's much more simple and reliable than trying to parse HTML with regex.
Upvotes: 3
Reputation: 81384
You should know that parsing HTML with regex is not the optimal way to solve this problem, and if you have access to a live DOM of the page, you should use DOM methods instead. As in, you should use
document.getElementById('someid').innerHTML // this will return 'contents'
instead of a regex.
Upvotes: 4