Reputation: 325
I need to capture the name of an anchor html tag with regex and php so from text I will get "hello" (the name of the anchor)
Tried that:
$regex = '/(?<=name\=")#([^]+?)#(?=")/i';
preg_match_all($regex, $content, $data);
print_r($data);
I've tailed the apache error log to find out that:
PHP Warning: preg_match_all(): Compilation failed: missing terminating ] for character class at offset 26
also tried:
$regex = '/(?<=name\=")([^]+?)(?=")/i';
$regex = '/(?<=name\=")[^]+?(?=")/i';
which are basically the same. I guess I'm missing something, probably a silly slash or something like that but I'm not sure as to what
Will appreciated any help Thanks
SOLVED
Ok, Thanks to @stillstanding and @Gordon I've managed to do that with DOMDocument which is much simple so, for the record, Here is the Snippet
$dom = new DOMDocument;
$dom->loadHTML($content);
foreach( $dom->getElementsByTagName('a') as $node ) {
echo $node->getAttribute( 'name' );
}
Upvotes: 1
Views: 1072
Reputation: 317177
Will only work for the exact <a name="[variable]">
string (string, not element. Regex have no clue about elements, nor attributes. They cannot parse HTML). See the links below your question for alternate approaches.
$text = '
<a name="anything">something</a> blabla
<span name="something">something</span> blabla
<a name="something else">something else</a> blabla
';
preg_match_all('#<a name="(.*)">#', $text, $matches);
print_r($matches);
gives
Array
(
[0] => Array
(
[0] => <a name="anything">
[1] => <a name="something else">
)
[1] => Array
(
[0] => anything
[1] => something else
)
)
Marking this CW because topic has been beaten to death
Upvotes: 0
Reputation: 80443
Your [^]+?
is a syntax error. What is it supposed to be? A minimal match of 1 or more instances, preferring less, of what thing? If you mean a nonmeta ^
, then you should just call it \^
. But if you mean any character that is not a ^
, you could use [^^]
, which you may write [^\^]
if that seems clearer to you.
If you mean which is not at the beginning of the line, well, that’s somewhat different. You could use a lookbehind negation, perhaps. But more information is needed.
If you are really bound and determined to use a regex to split HTML tags, then you should at least do it properly.
Upvotes: 0
Reputation: 71230
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?";
preg_match($regex, $yourstring, $result);
e.g.:
$yourstring="somelink.html#this";
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)";
preg_match($regex, $yourstring, $result);
echo substr($result[0],1);
Would return 'test'
However, the parse_rul function is probably a better bet to get this info from an address:
http://www.php.net/manual/en/function.preg-match.php#96339
If you wish to replace the actual anchor tags within a doc, see here
Upvotes: 1