Babu Murugan
Babu Murugan

Reputation: 25

Python regular expression to extract last name, first name, middle initial from text block

Please help me with correct regex to get below output

import re
text="name of company.\nlastname, firstname - 12345\nDates of Service Diag Xref Proc Code Voucher POS/TOS Units Provider Al/As Other Plan Bill Amt Receipts Net"
Re_text = re.compile(r'([A-Za-z]+),\s+([A-Za-z]+)\s+([A-Za-z]+)( - +\d+$)')

Expected output:

lastname
firstname
Middle initial (might not come for all scenarios)
12345

Upvotes: 2

Views: 673

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You can use

(?m)([A-Za-z]+)(?:,\s+([A-Za-z]+))?,\s+([A-Za-z]+)\s+-\s+(\d+)$
([^\W\d_]+)(?:,\s+([^\W\d_]+))?,\s+([^\W\d_]+)\s+-\s+(\d+)$

See the regex demo. Details:

  • (?m) - a re.M flag inline variant
  • ([A-Za-z]+) - Group 1: any one or more ASCII letters ([^\W\d_]+ matches any one or more Unicode letters)
  • (?:,\s+([A-Za-z]+))? - an optional non-capturing group matching one or zero occurrences of a comma, one or more whitespaces, and then one or more letters captured into Group 2
  • ,\s+ - a comma and one or more whitespaces
  • ([A-Za-z]+) - Group 3: any one or more ASCII letters ([^\W\d_]+ matches any one or more Unicode letters)
  • \s+-\s+ - a hyphen enclosed with one or more whitespacs
  • (\d+) - Group 4: one or more digits
  • $ - end of string.

See the Python demo:

import re
text="name of company.\nlastname, firstname - 12345\nDates of Service Diag Xref Proc Code Voucher POS/TOS Units Provider Al/As Other Plan Bill Amt Receipts Net"
Re_text = re.compile(r'([A-Za-z]+)(?:,\s+([A-Za-z]+))?,\s+([A-Za-z]+)\s+-\s+(\d+)$', re.M)
m = Re_text.search(text)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))
    print(m.group(4))

Output:

lastname
None
firstname
12345

Upvotes: 1

Related Questions