Python_beginner
Python_beginner

Reputation: 99

extract a number after and before some string from text using python3

How to extract the string before and after some specific string? and only extract 12 digit numbers for roll no?

input_file ="my bday is on 04/01/1997 and 
            frnd bday on 28/12/2018, 
            account no is A000142116 and 
            valid for 30 days for me and 
            for my frnd only 4 DAYS.my roll no is 130302101786
            and register number is 1600523941. Admission number is 
            181212001103" 

for line in input_file:
    m1 = re.findall(r"[\d]{1,2}/[\d]{1,2}/[\d]{4}", line)
    m2 = re.findall(r"A(\d+)", line)
    m3 = re.findall(r"(\d+)days", line)
    m4 = re.findall(r"(\d+)DAYS", line)
    m5 = re.findall(r"(\d+)", line)
    m6 = re.findall(r"(\d+)", line)
    m7 = re.findall(r"(\d+)", line)
    for date_n in m1:
       print(date_n)
    for account_no in m2:
       print(account_no)
    for valid_days in m3:
       print(valid_days)
    for frnd_DAYS in m4:
       print(frnd_DAYS)
    for roll_no in m5:
       print(roll_no)
    for register_no in m6:
       print(register_no)
    for admission_no in m7:
       print(admission_no)

Expected Output:

04/01/1997
28/12/2018
A000142116
30 days
4 DAYS
130302101786
1600523941
181212001103

Upvotes: 0

Views: 66

Answers (2)

Jan
Jan

Reputation: 43169

Use one expression for all of them:

\b[A-Z]?\d[/\d]*\b(?:\s+days)?

See a demo on regex101.com.
You'd need to precisize the "account number" format here.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521259

I would use a regex pattern with an alternation for all your possible matches:

\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}

This matches either a date, a number of days, or an account number. For account numbers, I assume that there are of length 10 or greater, consisting only of letters and numbers.

input_file = """my bday is on 04/01/1997 and 
                frnd bday on 28/12/2018, 
                account no is A000142116 and 
                valid for 30 days for me and 
                for my frnd only 4 DAYS.my roll no is 130302101786
                and register number is 1600523941. Admission number is 
                181212001103"""

results = re.findall(r'\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}', input_file, flags=re.IGNORECASE)
print(results)

['04/01/1997', '28/12/2018', 'A000142116', '30 days', '4 DAYS', '130302101786',
 '1600523941', '181212001103']

Upvotes: 0

Related Questions