Reputation: 300
I am trying using Python to go through many thousands of lines of SAS code. I want to extract certain parts of the code to be printed or to be sent to another function.
The SAS code I am looking at might look like this:
"""%macro msg (name= some_macro) ;
%put Hello World, my name is &name ;
%mend ;"""
And I want to capture what it between the first and the last line, i.e. between the %macro
and the %mend ;
line,so "%put Hello World, my name is &name ;" would be returned as a group.
I can achieve this capture with:
re.compile(r"\%macro\s*?.*?\s*?\((.*)\)\s*?;\n(.*?)\n\s*\%mend\s*;")
As (.*?)\n
seems to match the line I want.
NOTE: I am using a lot of \s*
because I see whitespace all over the SAS code which seems to be pretty random.
However when the SAS code is over more lines (it could be 2 or many more), I do not have the ability to pattern match, so for example,
"""%macro msg (name= some_macro) ;
%put Hello World, my name is &name ;
%let something happen
%do something else
%mend ;"""
Here I want to return "%put Hello World, my name is &name ; %let something happen %do something else" all as one group. I have tried putting in quantifiers, *
and +
but I do not know how to make it clear that want to check for the whole line repeating, rather than just the last character I put the quantifier next to. I will give this as an example:
r"\%macro\s*?.*?\s*?\((.*)\)\s*?;\n(.*?)\n+?\s*\%mend\s*;"
Here I am trying to indicate the line (.*?)\n
could be repeated between 1 and unlimited times, and that I want to capture that group.
I have also tried to use re.MULTILINE
and re.DOTALL
, using ^ and $ and dots for line end charters, but didn't achieve the desired result either.
Please help me understand this area better. Thanks
Upvotes: 1
Views: 187
Reputation: 163342
You could use a single capture group and match the lines that do not start with %mend.
The percentage sign does not need escaping and note that \s
could also match a newline if that is not intended.
%macro.*\r?\n((?:(?!\s*%mend).*\r?\n)+)\s*%mend ;
Explanation
%macro.*\r?\n
Match %macro followed by the rest of the line and a newline(
Capture group 1
(?:
Non capturing group
(?!\s*%mend)
Negative lookahead, if what is on the right is not %mend
.*\r?\n
Match the whole line and a newline)+
Close non capturing group and repeat 1+ times to match at least a single line)
Close capture group 1\s*%mend ;
For example
pattern = re.compile(r"%macro.*\r?\n((?:(?!\s*%mend).*\r?\n)+)\s*%mend ;")
print(re.findall(pattern, test_str))
Upvotes: 1