Nish
Nish

Reputation: 35

Python regex for searching substring

I am trying to extract substrings from a long string in python3

def get_data(text):
    initials = text.split()[1]
    names = re.search(initials+'(.*)EMP',text).group(1).lstrip().title()

    return initials, names

I need the following outputs

x,y = get_data('J JS JOHN SMITH EMP 223456')
JS
John Smith 

x,y = get_data('J JB JOE BLOGGS CONT 223456')
JB
Joe Bloggs

x,y = get_data('J JS JOHN SMITH 223456')
JS
John Smith

I can do it with either EMP or CONT but am struggling to do it with EMP OR CONT OR 'None' I'm new to regex hence help appreciated

Upvotes: 2

Views: 54

Answers (1)

anubhava
anubhava

Reputation: 784878

No need to do a split and then search.

You can use a single regex in re.findall or re.search or re.match:

^\S+\s+(\S+)\s+(.+?)(?:\s+(?:EMP|CONT))?\s+\d+

RegEx Demo

RegEx Details:

  • ^: Start
  • \S+: Match 1+ non-whitespaces
  • \s+: Match 1+ whitespaces
  • (\S+): Match 1+ non-whitespaces and capture in group #1
  • \s+: Must be followed by 1+ whitespaces
  • (.+?): Match 1+ of any character and capture in group #2
  • (?:\s+(?:EMP|CONT))?: optionally match EMP or CONT after 1+ whitespaces
  • \s+\d+: Followed by 1+ whitespaces and 1+ digits

Upvotes: 2

Related Questions