Reputation: 598
I need to search something like this:
lines = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm; -- end package;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end p_dio_bfm;"""
I need to extract the package name, i.e. p_dio_bfm and the package declaration, i.e. the part between "package p_dio_bfm is" and FIRST "end p_dio_bfm;"
The problem is that the package declaration may end with "end p_dio_bfm;" or "end package;" So I tried the following "OR" regex which: - works for packages ending with "end package" - does not work for packages ending with "end pck_name;"
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines)
The problem is the (package|\1) part of the regex, where I what to catch either the word "package" or the matched package name.
UPDATE: I have provided a full code that I hope will clarify it:
import re
lines1 = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end p_dio_bfm;"""
lines2 = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end package;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end package;"""
lines1 = lines1.replace('\n', ' ')
print lines1
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines1)
print match
lines2 = lines2.replace('\n', ' ')
print lines2
match = pattern.search(lines2)
print match
I expect in both cases, using a unique regex, to get back this part:
"""procedure setBFMCmd (
variable pin : in tBFMCmd
);"""
without the \n chars which I have removed.
Upvotes: 4
Views: 193
Reputation: 3410
How about:
>>> for row in re.findall(
... r'package(?:\s.*?)(?P<needle>[^\s]+)\s+is\s+(.*?)end\s+(?:package|(?P=needle));',
... lines,
... re.S
... ):
... print '{{{', row[1], '}}}'
...
{{{ procedure setBFMCmd (
variable pin : in tBFMCmd
);
}}}
{{{ procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
}}}
I took the liberty to not filter exactly how @mihai-hangiu asked by including the second block.
Upvotes: 2
Reputation: 107297
Your regex doesn't match anything since it's incorrect.Without using multi-line flag .*
won't match new line character,so instead you can use [\s\S]*
:
r'package ([^\s]+)\s+is([\s\S]*)end\s+(package|\1)\s*;'
See demo https://regex101.com/r/tZ3uH0/1
But there is some another problems here one that your string contains 2 package block and and this point that as a more elegant and efficient way you can sue re.DOTALL
flag which make the '.' special character match any character at all, including a newline.So you can write your regex like following :
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;",re.DOTALL)
But this still will match the first block :
>>> match = pattern.search(lines)
>>> print match.group(0)
package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm; -- end package;
>>> print match.group(1)
p_dio_bfm
>>> print match.group(2)
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm; --
>>> print match.group(3)
package
For match all blocks you need to clarify the words like body
in second group :
package\s+(?:\w+\s+?)?([^\s]+)\s+is(.*?)end\s+(package|\1)\s*;
See demo https://regex101.com/r/tZ3uH0/3
Upvotes: 3