Madhur Rampal
Madhur Rampal

Reputation: 55

Search start of the word using regular expression

How to write regular expression where we can find all words which are started by specified string. For ex-

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"

Here I want to fetch all words which are starting by dr using ignorecase. I tried but all functions results where dr is found in word not start of the word.

Thanks in advance.

Upvotes: 5

Views: 7460

Answers (4)

sariDon
sariDon

Reputation: 7971

Yet another solution.

The expression will search and return the exact and starting with words from a string matched with a string variable.

import re

txt = "this a a dr.seuse dr.brown dr. oz dr noone"
suggtxt= "dr."
w_regex = r"\b"+re.escape(suggtxt)+r"+\S*"
x = re.findall(w_regex, txt,  re.IGNORECASE)
print(x)

Output:

['dr.seuse', 'dr.brown', 'dr.']

Upvotes: 0

sampwing
sampwing

Reputation: 1268

>>> string_to_search_in
'this a a dr.seuse dr.brown dr. oz dr noone'
>>> re.compile('\b*?dr.?\s*?\w+', re.IGNORECASE).findall(string_to_search_in)
['dr.seuse', 'dr.brown', 'dr. oz', 'dr noone']

Upvotes: 0

utdemir
utdemir

Reputation: 27216

@Ferdinand Beyer's answer shows how to do it by regex. But you can easily achieve that with string functions:

>>> a
'asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl'
>>> cleaned = "".join(" " if i in string.punctuation else i for i in a)
>>> cleaned
'asasasa sasDRasas dr klklkl DR klklklkl Dr klklklkklkl'
>>> [word for word in cleaned.split() if word.lower().startswith("dr")]
['dr', 'DR', 'Dr']

Upvotes: 1

Ferdinand Beyer
Ferdinand Beyer

Reputation: 67137

You can use \b to find word boundaries, and the re.IGNORECASE flag to search case-insensitively.

import re

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
for match in re.finditer(r'\bdr', a, re.IGNORECASE):
    print 'Found match: "{0}" at position {1}'.format(match.group(0), match.start())

This will output:

Found match: "dr" at position 18
Found match: "DR" at position 28
Found match: "Dr" at position 40

Here, the pattern \bdr matches dr, but only if it is found at the start of a word. This will also yield matches for strings like driving. If you only want to find dr as unique word, use \bdr\b.

I use re.finditer() to scan through the search string and yield every match for dr in a loop. The re.IGNORECASE flag causes dr to also match DR, Dr and dR.

Upvotes: 10

Related Questions