Reputation: 621

Writing regular expression in python

I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that match to section 7.01 and then (a)

Basically with section can be followed by any number like 6.1/7.1/2.1

Examples:

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

I am trying to write an regular expression which can give me groups which contains these

Group 1

SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:

Group 2

(a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

Also there can be more points after (a) like b and so on.

Please help me out in writing an regular expression.

Upvotes: 2

Answers (3)

Shawn

Reputation: 21

In an effort to help you learn, should you have to write another set of regex, I would recommend you check out the docs below: https://docs.python.org/3/howto/regex.html#regex-howto

This is the "easy" introduction to python regex. Essentially, you're going to define a pattern, and use the above link as a reference to build your pattern as you need it. Then, call the pattern to apply it to whatever needs processing.

Upvotes: 0

ospahiu

Reputation: 3525

You can use the following approach, however, multiple assumptions are made. The section headers must begin with SECTION and end with a colon :. Secondly the sub-sections must begin with matching parenthesis', and end with a semi-colon.

import re
def extract_groups(s):
    sanitized_string = ''.join(line.strip() for line in s.split('\n'))
    sections = re.findall(r'SECTION.*?:', sanitized_string)
    sub_sections = re.findall(r'\([a-z]\).*?;', sanitized_string)
    return sections, sub_sections

Sample Output:

>>> s = """SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) Whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;

          (b) Test;
SECTION 7.02. Second section:"""
>>> print extract_groups(s)
(['SECTION 7.01. Events of Default. If any of the following events("Events of Default") shall occur:', 'SECTION 7.02. Second section:'], 
['(a) Whether at the due date thereofor at a date fixed for prepayment thereof or otherwise;', '(b) Test;'])

Upvotes: 3

Hanshan

Reputation: 3764

I got this to work:

s = """
SECTION 7.01. Events of Default. If any of the following events
("Events of Default") shall occur:
          (a) any Borrower shall fail to pay any principal of any Loan when and
     as the same shall become due and payable, whether at the due date thereof
     or at a date fixed for prepayment thereof or otherwise;
"""

r = r'(SECTION 7\.01\.[\s\w\.()"]*:)[\s]*(\(a\)[\s\w,]*;)'
mo = re.search(r, s)
print('Group 1: ' + mo.group(1))
print('Group 2: ' + mo.group(2))

If you wanted to make it generic, so you could grab the any number or section, you could try:

r = r'(SECTION [1-9]\.[0-9]{2}\.[\s\w\.()"]*:)[\s]*(\([a-z]\)[\s\w,]*;)'

Upvotes: 0

Writing regular expression in python

Answers (3)

Related Questions