Reputation: 25
Please help me with correct regex to get below output
import re
text="name of company.\nlastname, firstname - 12345\nDates of Service Diag Xref Proc Code Voucher POS/TOS Units Provider Al/As Other Plan Bill Amt Receipts Net"
Re_text = re.compile(r'([A-Za-z]+),\s+([A-Za-z]+)\s+([A-Za-z]+)( - +\d+$)')
Expected output:
lastname
firstname
Middle initial (might not come for all scenarios)
12345
Upvotes: 2
Views: 673
Reputation: 626690
You can use
(?m)([A-Za-z]+)(?:,\s+([A-Za-z]+))?,\s+([A-Za-z]+)\s+-\s+(\d+)$
([^\W\d_]+)(?:,\s+([^\W\d_]+))?,\s+([^\W\d_]+)\s+-\s+(\d+)$
See the regex demo. Details:
(?m)
- a re.M
flag inline variant([A-Za-z]+)
- Group 1: any one or more ASCII letters ([^\W\d_]+
matches any one or more Unicode letters)(?:,\s+([A-Za-z]+))?
- an optional non-capturing group matching one or zero occurrences of a comma, one or more whitespaces, and then one or more letters captured into Group 2,\s+
- a comma and one or more whitespaces([A-Za-z]+)
- Group 3: any one or more ASCII letters ([^\W\d_]+
matches any one or more Unicode letters)\s+-\s+
- a hyphen enclosed with one or more whitespacs(\d+)
- Group 4: one or more digits$
- end of string.See the Python demo:
import re
text="name of company.\nlastname, firstname - 12345\nDates of Service Diag Xref Proc Code Voucher POS/TOS Units Provider Al/As Other Plan Bill Amt Receipts Net"
Re_text = re.compile(r'([A-Za-z]+)(?:,\s+([A-Za-z]+))?,\s+([A-Za-z]+)\s+-\s+(\d+)$', re.M)
m = Re_text.search(text)
if m:
print(m.group(1))
print(m.group(2))
print(m.group(3))
print(m.group(4))
Output:
lastname
None
firstname
12345
Upvotes: 1