Reputation: 621
I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that match to section 7.01
and then (a)
Basically with section
can be followed by any number like 6.1
/7.1
/2.1
Examples:
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
(a) any Borrower shall fail to pay any principal of any Loan when and
as the same shall become due and payable, whether at the due date thereof
or at a date fixed for prepayment thereof or otherwise;
I am trying to write an regular expression which can give me groups which contains these
Group 1
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
Group 2
(a) any Borrower shall fail to pay any principal of any Loan when and
as the same shall become due and payable, whether at the due date thereof
or at a date fixed for prepayment thereof or otherwise;
Also there can be more points after (a)
like b
and so on.
Please help me out in writing an regular expression.
Upvotes: 2
Views: 163
Reputation: 21
In an effort to help you learn, should you have to write another set of regex, I would recommend you check out the docs below: https://docs.python.org/3/howto/regex.html#regex-howto
This is the "easy" introduction to python regex. Essentially, you're going to define a pattern, and use the above link as a reference to build your pattern as you need it. Then, call the pattern to apply it to whatever needs processing.
Upvotes: 0
Reputation: 3525
You can use the following approach, however, multiple assumptions are made. The section headers must begin with SECTION
and end with a colon :
. Secondly the sub-sections must begin with matching parenthesis', and end with a semi-colon.
import re
def extract_groups(s):
sanitized_string = ''.join(line.strip() for line in s.split('\n'))
sections = re.findall(r'SECTION.*?:', sanitized_string)
sub_sections = re.findall(r'\([a-z]\).*?;', sanitized_string)
return sections, sub_sections
Sample Output:
>>> s = """SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
(a) Whether at the due date thereof
or at a date fixed for prepayment thereof or otherwise;
(b) Test;
SECTION 7.02. Second section:"""
>>> print extract_groups(s)
(['SECTION 7.01. Events of Default. If any of the following events("Events of Default") shall occur:', 'SECTION 7.02. Second section:'],
['(a) Whether at the due date thereofor at a date fixed for prepayment thereof or otherwise;', '(b) Test;'])
Upvotes: 3
Reputation: 3764
I got this to work:
s = """
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
(a) any Borrower shall fail to pay any principal of any Loan when and
as the same shall become due and payable, whether at the due date thereof
or at a date fixed for prepayment thereof or otherwise;
"""
r = r'(SECTION 7\.01\.[\s\w\.()"]*:)[\s]*(\(a\)[\s\w,]*;)'
mo = re.search(r, s)
print('Group 1: ' + mo.group(1))
print('Group 2: ' + mo.group(2))
If you wanted to make it generic, so you could grab the any number or section, you could try:
r = r'(SECTION [1-9]\.[0-9]{2}\.[\s\w\.()"]*:)[\s]*(\([a-z]\)[\s\w,]*;)'
Upvotes: 0