Reputation: 55
How to write regular expression where we can find all words which are started by specified string. For ex-
a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
Here I want to fetch all words which are starting by dr
using ignorecase. I tried but all functions results where dr
is found in word not start of the word.
Thanks in advance.
Upvotes: 5
Views: 7460
Reputation: 7971
Yet another solution.
The expression will search and return the exact and starting with words from a string matched with a string variable.
import re
txt = "this a a dr.seuse dr.brown dr. oz dr noone"
suggtxt= "dr."
w_regex = r"\b"+re.escape(suggtxt)+r"+\S*"
x = re.findall(w_regex, txt, re.IGNORECASE)
print(x)
Output:
['dr.seuse', 'dr.brown', 'dr.']
Upvotes: 0
Reputation: 1268
>>> string_to_search_in
'this a a dr.seuse dr.brown dr. oz dr noone'
>>> re.compile('\b*?dr.?\s*?\w+', re.IGNORECASE).findall(string_to_search_in)
['dr.seuse', 'dr.brown', 'dr. oz', 'dr noone']
Upvotes: 0
Reputation: 27216
@Ferdinand Beyer's answer shows how to do it by regex. But you can easily achieve that with string functions:
>>> a
'asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl'
>>> cleaned = "".join(" " if i in string.punctuation else i for i in a)
>>> cleaned
'asasasa sasDRasas dr klklkl DR klklklkl Dr klklklkklkl'
>>> [word for word in cleaned.split() if word.lower().startswith("dr")]
['dr', 'DR', 'Dr']
Upvotes: 1
Reputation: 67137
You can use \b
to find word boundaries, and the re.IGNORECASE
flag to search case-insensitively.
import re
a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
for match in re.finditer(r'\bdr', a, re.IGNORECASE):
print 'Found match: "{0}" at position {1}'.format(match.group(0), match.start())
This will output:
Found match: "dr" at position 18 Found match: "DR" at position 28 Found match: "Dr" at position 40
Here, the pattern \bdr
matches dr, but only if it is found at the start of a word. This will also yield matches for strings like driving. If you only want to find dr as unique word, use \bdr\b
.
I use re.finditer()
to scan through the search string and yield every match for dr in a loop. The re.IGNORECASE
flag causes dr
to also match DR
, Dr
and dR
.
Upvotes: 10