Reputation: 325

Getting anchor name with php regex

I need to capture the name of an anchor html tag with regex and php so from text I will get "hello" (the name of the anchor)

Tried that:

$regex  = '/(?<=name\=")#([^]+?)#(?=")/i';  
preg_match_all($regex, $content, $data);
print_r($data);

I've tailed the apache error log to find out that:

PHP Warning: preg_match_all(): Compilation failed: missing terminating ] for character class at offset 26

also tried:

$regex  = '/(?<=name\=")([^]+?)(?=")/i'; 
$regex  = '/(?<=name\=")[^]+?(?=")/i';

which are basically the same. I guess I'm missing something, probably a silly slash or something like that but I'm not sure as to what

Will appreciated any help Thanks

SOLVED

Ok, Thanks to @stillstanding and @Gordon I've managed to do that with DOMDocument which is much simple so, for the record, Here is the Snippet

$dom = new DOMDocument;
    $dom->loadHTML($content);
    foreach( $dom->getElementsByTagName('a') as $node ) {
        echo $node->getAttribute( 'name' );
    }

Upvotes: 1

Answers (4)

Gordon

Reputation: 317177

Will only work for the exact <a name="[variable]"> string (string, not element. Regex have no clue about elements, nor attributes. They cannot parse HTML). See the links below your question for alternate approaches.

$text = '
    <a name="anything">something</a> blabla
    <span name="something">something</span>  blabla
    <a name="something else">something else</a>  blabla
';

preg_match_all('#<a name="(.*)">#', $text, $matches);
print_r($matches);

gives

Array
(
    [0] => Array
        (
            [0] => <a name="anything">
            [1] => <a name="something else">
        )

    [1] => Array
        (
            [0] => anything
            [1] => something else
        )
)

Marking this CW because topic has been beaten to death

Upvotes: 0

tchrist

Reputation: 80443

Your [^]+? is a syntax error. What is it supposed to be? A minimal match of 1 or more instances, preferring less, of what thing? If you mean a nonmeta ^, then you should just call it \^. But if you mean any character that is not a ^, you could use [^^], which you may write [^\^] if that seems clearer to you.

If you mean which is not at the beginning of the line, well, that’s somewhat different. You could use a lookbehind negation, perhaps. But more information is needed.

If you are really bound and determined to use a regex to split HTML tags, then you should at least do it properly.

Upvotes: 0

SW4

Reputation: 71230

$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?";
preg_match($regex, $yourstring, $result);

e.g.:

$yourstring="somelink.html#this";
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)";
preg_match($regex, $yourstring, $result);
echo substr($result[0],1);

Would return 'test'

However, the parse_rul function is probably a better bet to get this info from an address:

http://www.php.net/manual/en/function.preg-match.php#96339

If you wish to replace the actual anchor tags within a doc, see here

Upvotes: 1

bcosca

Reputation: 17555

Use DOMXPath for this along with DOMDocument or SimpleXML. But never, ever use regex patterns!

Upvotes: 2

Getting anchor name with php regex

Answers (4)

Related Questions