Reputation:
The following code will help me to parse patterns to be used with the standard module re
.
import sre_parse
pattern = r"(?P<TEST>test)\s+\w*(?P=TEST)|abcde"
parsedpattern = sre_parse.parse(pattern)
parsedpattern.dump()
In a terminal, this gives an easy to parse text.
branch
subpattern 1
literal 116
literal 101
literal 115
literal 116
max_repeat 1 2147483647
in
category category_space
max_repeat 0 2147483647
in
category category_word
groupref 1
or
literal 97
literal 98
literal 99
literal 100
literal 101
Is there an easy way to have this text as a string variable ? I can use the code of the method dump
which is given by applying inspect.getsourcelines
to sre_parse.SubPattern
thanks to the module inspect
. But I'm hopping a more direct solution if there is one.
PS : I have not found any readable documentation about the module sre_parse
. Do you know anyone ?
Upvotes: 0
Views: 1080
Reputation: 10360
You could always mess around with sys.stdout
and redirect it to a variable, in a way:
import sre_parse
import sys
class PseudoStdout:
def __init__(self):
self.contents = ''
def __enter__(self): # this and __exit__ are for context management
self.old_stdout = sys.stdout
sys.stdout = self
def __exit__(self, type_, value, traceback):
sys.stdout = self.old_stdout
def write(self, text): # magic method that makes it behave like a file
self.contents += text
pattern = r"(?P<TEST>test)\s+\w*(?P=TEST)|abcde"
parsedpattern = sre_parse.parse(pattern)
ps = PseudoStdout()
with ps:
parsedpattern.dump()
print(repr(ps.contents))
Result:
'branch \n subpattern 1 \n literal 116 \n literal 101 \n literal 115 \n literal 116 \n max_repeat 1 65535 \n in \n category category_space\n max_repeat 0 65535 \n in \n category category_word\n groupref 1 \nor\n literal 97 \n literal 98 \n literal 99 \n literal 100 \n literal 101 \n'
It seems more straightforward, though, to just step through parsedpattern
itself, which is already structured.
Upvotes: 3