Reputation: 2501
This might have been asked before, but, I am unable to find a solution. Suppose my text is 'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe'
and I want to find and remove all words that has the regex r'cmd.exe'
. The result must be: '/v /c hello'
.
This is what I tried: First, I tried to find the indices of the words-boundaries so that I can remove them. But, the indices I got was for the exact regular-expression...not for the whole matching word.
In [41]: [(m.start(0), m.end(0)) for m in re.finditer(r'\b\w*cmd.exe\w*\b', cmd)]
Out[41]: [(20, 27), (40, 47)]
In [42]: [(m.start(0), m.end(0)) for m in re.finditer(r'cmd.exe', cmd)]
Out[42]: [(20, 27), (40, 47)]
In [44]: result = re.findall(r'cmd.exe', cmd, re.I)
In [45]: result
Out[45]: ['cmd.exe', 'cmd.exe']. <-- I wanted ['C:\\Windows\\system32\\cmd.exe', 'cmd.exe']
In [48]: result = re.findall(r'cmd.exe|\bcmd.exe\b', cmd, re.I)
In [49]: result
Out[49]: ['cmd.exe', 'cmd.exe']
In short, how to get the whole word(s) that contains the substring/regex?
Upvotes: 2
Views: 135
Reputation: 11400
Not saying regex is bad*, but why not simply:
txt = 'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe'
outcome = ' '.join([part for part in txt.split(' ') if not 'cmd.exe' in part])
which gives:
'/v /c hello'
*Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Upvotes: 3
Reputation: 784868
You may use this regex:
>>> s = r'C:\\Windows\\system32\\cmd.exe /v /c hello cmd.exe'
>>> print (re.sub(r'\S*cmd\.exe\S*\s*', '', s))
/v /c hello
RegEx Details:
'\S*
: Match 0 or more non-whitespace characterscmd\.exe
: Match cmd.exe
\S*
: Match 0 or more non-whitespace characters\s*
: Match 0 or more whitespace charactersUpvotes: 1