Reputation: 403
I am new to using regular expressions and have been trying to figure out a way of selecting an element of a list which contains two words seperated by whitespace.
I have the following dummy list: ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
I would like only element 3 matched ('word two <= 0.01')
I have tried using: \b\w+(?=\s)\b which I cut bits and pieces out of other related questions from stack overflow to find. I know this doesn't work, as there is whitespace after the second word (before <=) however I am just stuck trying to figure out how to fix it.
Here is an example of my code:
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = '\b\w+(?=\s)\b'
for i in example_list:
if re.match(regex, i):
new_list.append(i)
print(new_list)
Upvotes: 1
Views: 1202
Reputation: 627044
To match a string starting with 1+ word chars, then 1+ whitespaces and then again a word char, you may use
import re
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = r'\w+\s+\w+\b'
for i in example_list:
if re.match(regex, i):
new_list.append(i)
print(new_list)
# => ['word two <= 0.01']
See the Python demo.
Note re.match
already anchors the match at the start of string, hence no ^
in the above regex. Also, as you used a regular string literal, \b
in your pattern are backspace chars, not word boundary patterns.
If you need to match a string that has word char + whitespace(s) + word char anywhere in the string, replace re.match
with re.search
and you may even use r'\w\s+\w'
. Or, if you really need to check word boundaries, r'\b\w+\s+\w+\b'
.
Upvotes: 2