Reputation: 99
How to extract the string before and after some specific string? and only extract 12 digit numbers for roll no?
input_file ="my bday is on 04/01/1997 and
frnd bday on 28/12/2018,
account no is A000142116 and
valid for 30 days for me and
for my frnd only 4 DAYS.my roll no is 130302101786
and register number is 1600523941. Admission number is
181212001103"
for line in input_file:
m1 = re.findall(r"[\d]{1,2}/[\d]{1,2}/[\d]{4}", line)
m2 = re.findall(r"A(\d+)", line)
m3 = re.findall(r"(\d+)days", line)
m4 = re.findall(r"(\d+)DAYS", line)
m5 = re.findall(r"(\d+)", line)
m6 = re.findall(r"(\d+)", line)
m7 = re.findall(r"(\d+)", line)
for date_n in m1:
print(date_n)
for account_no in m2:
print(account_no)
for valid_days in m3:
print(valid_days)
for frnd_DAYS in m4:
print(frnd_DAYS)
for roll_no in m5:
print(roll_no)
for register_no in m6:
print(register_no)
for admission_no in m7:
print(admission_no)
Expected Output:
04/01/1997
28/12/2018
A000142116
30 days
4 DAYS
130302101786
1600523941
181212001103
Upvotes: 0
Views: 66
Reputation: 43169
Use one expression for all of them:
\b[A-Z]?\d[/\d]*\b(?:\s+days)?
See a demo on regex101.com.
You'd need to precisize the "account number" format here.
Upvotes: 1
Reputation: 521259
I would use a regex pattern with an alternation for all your possible matches:
\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}
This matches either a date, a number of days
, or an account number. For account numbers, I assume that there are of length 10 or greater, consisting only of letters and numbers.
input_file = """my bday is on 04/01/1997 and
frnd bday on 28/12/2018,
account no is A000142116 and
valid for 30 days for me and
for my frnd only 4 DAYS.my roll no is 130302101786
and register number is 1600523941. Admission number is
181212001103"""
results = re.findall(r'\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}', input_file, flags=re.IGNORECASE)
print(results)
['04/01/1997', '28/12/2018', 'A000142116', '30 days', '4 DAYS', '130302101786',
'1600523941', '181212001103']
Upvotes: 0