Ahoura Ghotbi
Ahoura Ghotbi

Reputation: 2896

how to create a pattern for preg_match_all

I tried googling this but I couldnt find anything clear about it. first I was hoping someone could help me write a pattern to get the info between these tags :

<vboxview leftinset="10" rightinset="0" stretchiness="1">    // CONTENT INSIDE HERE </vboxview>

and second, could you also please explain the pattern in details for each section and what it does and how you specify to get a certain part of the code.

Upvotes: 0

Views: 332

Answers (2)

Useless Code
Useless Code

Reputation: 12402

As it has been mentioned in the comments it is usually not a good idea to try to extract things from HTML with regular expressions. If you ever want to switch to a more bulletproof method here's a quick example of how you could easily extract the information using the DOMDocument API.

<?php
function get_vboxview($html) {

    $output = array();

    // Create a new DOM object
    $doc = new DOMDocument;

    // load a string in as html
    $doc->loadHTML($html);

    // create a new Xpath object to query the document with
    $xpath = new DOMXPath($doc);

    // an xpath query that looks for a vboxview node anywhere in the DOM
    // with an attribute named leftinset set to 10, an attribute named rightinset
    // set to 0 and an attribute named stretchiness set to 1
    $query = '//vboxview[@leftinset=10 and @rightinset=0 and @stretchiness=1]';

    // query the document
    $matches = $xpath->query($query);

    // loop through each matching node
    // and the textContent to the output
    foreach ($matches as $m) {
            $output[] = $m->textContent;
    }

    return $output;
}
?>

Better yet if there is guaranteed to be only one vboxview in your input (also assuming you have control of the HTML) you could add an id attribute to vboxview and cut the code down to a shorter and more generalized function.

<?php
function get_node_text($html, $id) {
    // Create a new DOM object
    $doc = new DOMDocument;

    // load a string in as html
    $doc->loadHTML($html);

    // return the textContent of the node with the id $id
    return $doc->getElementById($id)->textContent;
}
?>

Upvotes: 1

Code Jockey
Code Jockey

Reputation: 6721

See my comment on the question for my rant on SGML-based languages and regex...

Now to my answer.

If you know there will not be any other HTML/XML elements inside the tag in question, then this will work quite well:

<vboxview\s(?P<vboxviewAttributes>(\\>|[^>])*)>(?P<vboxviewContent>(\\<|[^<])*)</vboxview>

Broken down, this expression says:

<vboxview                  # match `<vboxview` literally
\s+                        # match at least one whitespace character
(?P<vboxviewAttributes>    # begin capture (into a group named "vboxViewAttributes")
   (\\>|[^>])*             #    any number of (either `\>` or NOT `>`)
)                          # end capture
>                          # match a `>` character
(?P<vboxviewContent>       # begin capture (into a group named "vboxViewContent")
   (\\<|[^<])*             #    any number of (either `\<` or NOT `<`)
)                          # end capture
</vboxview>                # match `</vboxview>` literally

You will need to escape and > characters inside the source as \> or even better as HTML/XML entities

If there are going to be nested constructs inside, then you are either going to start running into problems with regex, or you will have already decided to use another method that does not involve regex - either way is sufficient!

Upvotes: 1

Related Questions