Reputation: 3097
I have a large set of HTML files that I need to parse the <? and ?>
tags out of, keeping in mind <?xml
and the fact that an opening <?php
tag doesn't need an ending tag... EOF counts too.
My regular expression knowledge is admittedly lacking: /<\?[^(\?>)]*\?>/
Example HTML:
<?
function trans($value) {
// Make sure it does not translate the function call itself
}
?>
<!-- PHP
code -->
<div id='test' <?= $extraDiv ?>>
<?= trans("hello"); ?>
<? if ($something == 'hello'): ?>
<? if ($something == 'hello'): ?>
<p>Hello</p>
<? endif; ?>
<?php
// Some multiline PHP stuff
echo trans("You are \"great'"); // I threw some quotes in to toughen the test
echo trans("Will it still work with two");
echo trans('and single quotes');
echo trans("multiline
stuff
");
echo trans("from array('test')",array('test'));
$counter ++;
?>
<p>Smart <?= $this->translation ?> time</p>
<p>Smart <?=$translation ?> time</p>
<p>Smart <?= $_POST['translation'] ?> time</p>
</div>
<?
trans("This php tag has no end");
Hoped for Array:
[0] => "<?
function trans($value) {
// Make sure it does not translate the function call itself
}
?>",
[1] => "<?= $extraDiv ?>",
[2] => etc...
Upvotes: 1
Views: 111
Reputation: 586
It looks like what you're looking for is lookahead and lookbehind. These regex operators basically allow you to include text in the search but omit it from the final result.
So first, you'd want to change your regex to this:
'/(?<=\<\?)[^(\?\>)]*(?=\?\>)/'
For EOF, you'd use the $ symbol. Therefore:
'/(?<=\<\?)[^(\?\>)]*(?=\?\>|$)/'
I haven't tested this but I think that should do what you're looking for, or at very least point you in the right direction.
Upvotes: 0
Reputation: 10786
No, that isn't how character classes work. Luckily you don't need to worry about that because we can use a ?
to make the character class non-greedy. I'll also add a s
to the end so that .
can also match newlines, it usually can't.
/<\?(.*?)\?>/s
Upvotes: 2