user1717931
user1717931

Reputation: 2501

How to find all words that match a regex?

This might have been asked before, but, I am unable to find a solution. Suppose my text is 'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe' and I want to find and remove all words that has the regex r'cmd.exe'. The result must be: '/v /c hello'.

This is what I tried: First, I tried to find the indices of the words-boundaries so that I can remove them. But, the indices I got was for the exact regular-expression...not for the whole matching word.

In [41]: [(m.start(0), m.end(0)) for m in re.finditer(r'\b\w*cmd.exe\w*\b', cmd)]
Out[41]: [(20, 27), (40, 47)]

In [42]: [(m.start(0), m.end(0)) for m in re.finditer(r'cmd.exe', cmd)]
Out[42]: [(20, 27), (40, 47)]

In [44]: result = re.findall(r'cmd.exe', cmd, re.I)

In [45]: result
Out[45]: ['cmd.exe', 'cmd.exe']. <-- I wanted ['C:\\Windows\\system32\\cmd.exe', 'cmd.exe']

In [48]: result = re.findall(r'cmd.exe|\bcmd.exe\b', cmd, re.I)

In [49]: result
Out[49]: ['cmd.exe', 'cmd.exe']

In short, how to get the whole word(s) that contains the substring/regex?

Upvotes: 2

Views: 135

Answers (2)

alex
alex

Reputation: 11400

Not saying regex is bad*, but why not simply:

txt = 'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe'
outcome = ' '.join([part for part in txt.split(' ') if not 'cmd.exe' in part])

which gives:

'/v /c hello'

*Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Upvotes: 3

anubhava
anubhava

Reputation: 784868

You may use this regex:

>>> s = r'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe'
>>> print (re.sub(r'\S*cmd\.exe\S*\s*', '', s))
/v /c hello

RegEx Details:

  • '\S*: Match 0 or more non-whitespace characters
  • cmd\.exe: Match cmd.exe
  • \S*: Match 0 or more non-whitespace characters
  • \s*: Match 0 or more whitespace characters

Upvotes: 1

Related Questions