Reputation: 2896
I tried googling this but I couldnt find anything clear about it. first I was hoping someone could help me write a pattern to get the info between these tags :
<vboxview leftinset="10" rightinset="0" stretchiness="1"> // CONTENT INSIDE HERE </vboxview>
and second, could you also please explain the pattern in details for each section and what it does and how you specify to get a certain part of the code.
Upvotes: 0
Views: 332
Reputation: 12402
As it has been mentioned in the comments it is usually not a good idea to try to extract things from HTML with regular expressions. If you ever want to switch to a more bulletproof method here's a quick example of how you could easily extract the information using the DOMDocument API.
<?php
function get_vboxview($html) {
$output = array();
// Create a new DOM object
$doc = new DOMDocument;
// load a string in as html
$doc->loadHTML($html);
// create a new Xpath object to query the document with
$xpath = new DOMXPath($doc);
// an xpath query that looks for a vboxview node anywhere in the DOM
// with an attribute named leftinset set to 10, an attribute named rightinset
// set to 0 and an attribute named stretchiness set to 1
$query = '//vboxview[@leftinset=10 and @rightinset=0 and @stretchiness=1]';
// query the document
$matches = $xpath->query($query);
// loop through each matching node
// and the textContent to the output
foreach ($matches as $m) {
$output[] = $m->textContent;
}
return $output;
}
?>
Better yet if there is guaranteed to be only one vboxview
in your input (also assuming you have control of the HTML) you could add an id
attribute to vboxview
and cut the code down to a shorter and more generalized function.
<?php
function get_node_text($html, $id) {
// Create a new DOM object
$doc = new DOMDocument;
// load a string in as html
$doc->loadHTML($html);
// return the textContent of the node with the id $id
return $doc->getElementById($id)->textContent;
}
?>
Upvotes: 1
Reputation: 6721
See my comment on the question for my rant on SGML-based languages and regex...
Now to my answer.
If you know there will not be any other HTML/XML elements inside the tag in question, then this will work quite well:
<vboxview\s(?P<vboxviewAttributes>(\\>|[^>])*)>(?P<vboxviewContent>(\\<|[^<])*)</vboxview>
Broken down, this expression says:
<vboxview # match `<vboxview` literally
\s+ # match at least one whitespace character
(?P<vboxviewAttributes> # begin capture (into a group named "vboxViewAttributes")
(\\>|[^>])* # any number of (either `\>` or NOT `>`)
) # end capture
> # match a `>` character
(?P<vboxviewContent> # begin capture (into a group named "vboxViewContent")
(\\<|[^<])* # any number of (either `\<` or NOT `<`)
) # end capture
</vboxview> # match `</vboxview>` literally
You will need to escape and >
characters inside the source as \>
or even better as HTML/XML entities
If there are going to be nested constructs inside, then you are either going to start running into problems with regex, or you will have already decided to use another method that does not involve regex - either way is sufficient!
Upvotes: 1