Reputation: 35
I am trying to extract substrings from a long string in python3
def get_data(text):
initials = text.split()[1]
names = re.search(initials+'(.*)EMP',text).group(1).lstrip().title()
return initials, names
I need the following outputs
x,y = get_data('J JS JOHN SMITH EMP 223456')
JS
John Smith
x,y = get_data('J JB JOE BLOGGS CONT 223456')
JB
Joe Bloggs
x,y = get_data('J JS JOHN SMITH 223456')
JS
John Smith
I can do it with either EMP or CONT but am struggling to do it with EMP OR CONT OR 'None' I'm new to regex hence help appreciated
Upvotes: 2
Views: 54
Reputation: 784878
No need to do a split and then search.
You can use a single regex in re.findall
or re.search
or re.match
:
^\S+\s+(\S+)\s+(.+?)(?:\s+(?:EMP|CONT))?\s+\d+
RegEx Details:
^
: Start\S+
: Match 1+ non-whitespaces\s+
: Match 1+ whitespaces(\S+)
: Match 1+ non-whitespaces and capture in group #1\s+
: Must be followed by 1+ whitespaces(.+?)
: Match 1+ of any character and capture in group #2(?:\s+(?:EMP|CONT))?
: optionally match EMP
or CONT
after 1+ whitespaces\s+\d+
: Followed by 1+ whitespaces and 1+ digitsUpvotes: 2