Reputation: 125
Original question locates here, current question is desire to avoid one problem.
I have this code which works perfect with html_1 data:
from pyparsing import nestedExpr, originalTextFor
html_1 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>foo</h1>
</body>
</html>
'''
html_2 = '''
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo $tpl->showStyle(); ?>>foo</h1>
</body>
</html>
'''
nested_angle_braces = nestedExpr('<', '>')
# for match in nested_angle_braces.searchString(html):
# print(match)
# nested_angle_braces_with_h1 = nested_angle_braces().addCondition(
# lambda tokens: tokens[0][0].lower() == 'h1')
nested_angle_braces_with_h1 = originalTextFor(
nested_angle_braces().addCondition(lambda tokens: tokens[0][0].lower() == 'h1')
)
nested_angle_braces_with_h1.addParseAction(lambda tokens: tokens[0] + 'MY_TEXT')
print(nested_angle_braces_with_h1.transformString(html_1))
Result of html_1 variable is:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
</body>
</html>
Here is all right, all placed as expected. MY_TEXT located in right region (inside h1 tag).
But let's see result for html_2:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo $tpl->showStyle(); ?>MY_TEXT>foo</h1>
</body>
</html>
Now we got error, MY_TEXT placed inside h1 property area because PHP contains brace inside "$tpl->".
How I can fix it? I need get this result in that region:
<h1 <?php echo $tpl->showStyle(); ?>>MY_TEXTfoo</h1>
Upvotes: 1
Views: 56
Reputation: 63729
The solution requires that we define a special expression for PHP tags, which our simple nestedExpr
gets confused by.
# define an expression for a PHP tag
php_tag = Literal('<?') + 'php' + SkipTo('?>', include=True)
We'll need more than simple strings now for the opener and closer, including a negative lookahead when matching a '<' to make sure we aren't at the leading edge of a PHP tag:
# define expressions for opener and closer, such that we don't
# accidentally interpret a PHP tag as a nested expr
opener = ~php_tag + Literal("<")
closer = Literal(">")
If opener and closer aren't simple strings, then we need to give a content expression too. Our content will be very simple to define, just PHP tags or other Words of printables, excluding '<' and '>' (you'll end up wrapping this all back up in originalTextFor
anyway):
# define nested_angle_braces to potentially contain PHP tag, or
# some other printable (not including '<' or '>' chars)
nested_angle_braces = nestedExpr(opener, closer,
content=php_tag | Word(printables, excludeChars="<>"))
Now if I use nested_angle_braces.searchString
to scan html_2
, I get:
for tag in originalTextFor(nested_angle_braces).searchString(html_2):
print(tag)
['<html>']
['<head>']
['<title>']
['</title>']
['<head>']
['<body>']
['<h1 <?php echo $tpl->showStyle(); ?>>']
['</h1>']
['</body>']
['</html>']
Upvotes: 1