Reputation: 147
I have a string
Manager of Medical Threat Devlop at Micro
I want to find any words that go after at
, for
, of
. Here, I want to get the ['Micro']
(that is at the end of string, after the last at
word).
Current code
If I apply r'(?:for|at|of)\s+(.*)'
I will get incorrect ['Medical Threat Devlop at Micro']
.
More examples:
Manager of Medical Threat Devlop at Canno
-> Canno
Manager of Medicalof Threat Devlop of Canno
-> Canno
Manager of Medicalfor Threat Devlop for Canno
-> Canno
Threat Devlop at Canno Matt
-> Canno Matt
Upvotes: 2
Views: 750
Reputation: 6224
Try this re.split
would work this.
Your question is not fully clear give some more input and output examples.
import re
s = 'Manager of Medical Threat Devlop at Micro'
s = re.split(r'at |for |of ',s)[-1:]
print(s)
IN : OUTPUT
'Manager of Medical Threat Devlop at Micro' : ['Micro']
'Threat Devlop at Canno Matt' : ['Canno Matt']
THERE IS ANOTHER METHOD TO DO THIS (USING re.finditer
).
import re
string = 'Threat Devlop at Canno Matt'
s = re.finditer(r'(at | for | of )',string,)
last_index = list(s)[-1].end()
print(string[last_index:])
I am not good in re
at all.(But I get it)
Yeah there is another to do this.(Using re.findall
)
import re
string = 'Threat Devlop at Canno of Matjkasa'
s = re.findall(r'.*(?:at|for|of)\s+', string)
print(string.replace(*s,''))
Upvotes: 2
Reputation: 627468
You can use
re.findall(r'.*\b(?:for|at|of)\s+(.*)', text)
See the regex demo. Details:
.*
- any zero or more chars other than line break chars, as many as possible\b
- a word boundary(?:for|at|of)
- for
, at
or of
\s+
- one or more whitespaces(.*)
- Group 1: any zero or more chars other than line break chars, as many as possible.Another regex that will fetch the same results is
re.findall(r'\b(?:for|at|of)\s+((?:(?!\b(?:for|at|of)\b).)*)$', text)
Details:
\b
- a word boundary(?:for|at|of)
- for
, at
or of
\s+
- one or more whitespaces((?:(?!\b(?:for|at|of)\b).)*)
- Group 1: any char, other than a line break char, zero or more but as many as possible, occurrences, that does not start a for
, at
or of
as a whole word char sequence$
- end of string.Note you can also use re.search
since you expect a single match:
match = re.search(r'.*\b(?:for|at|of)\s+(.*)', text)
if match:
print(match.group(1))
Upvotes: 2
Reputation: 15502
If you want to do it with a regex, then here's the way to do it.
Replace matches of the following regex with the empty string:
.*\b(?:for|at|of)\b\s?
This will match:
.*
: any character (by its nature, this pattern will match as most characters as possible)\b(?:for|at|of)\b
: your hotwords between boundary symbols\s?
: an optional spaceCheck the demo here
Upvotes: 1