codiearcher
codiearcher

Reputation: 403

Python regex to find sequences matching: word + whitespace + word

I am new to using regular expressions and have been trying to figure out a way of selecting an element of a list which contains two words seperated by whitespace.

I have the following dummy list: ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']

I would like only element 3 matched ('word two <= 0.01')

I have tried using: \b\w+(?=\s)\b which I cut bits and pieces out of other related questions from stack overflow to find. I know this doesn't work, as there is whitespace after the second word (before <=) however I am just stuck trying to figure out how to fix it.

Here is an example of my code:

example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']

new_list = []

regex = '\b\w+(?=\s)\b'

for i in example_list:
    if re.match(regex, i):
        new_list.append(i)

print(new_list)

Upvotes: 1

Views: 1202

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627044

To match a string starting with 1+ word chars, then 1+ whitespaces and then again a word char, you may use

import re
example_list = ['word <= 0.02', 'word_one <= 0.04', 'word two <= 0.01']
new_list = []
regex = r'\w+\s+\w+\b'
for i in example_list:
    if re.match(regex, i):
        new_list.append(i)
print(new_list)
# => ['word two <= 0.01']

See the Python demo.

Note re.match already anchors the match at the start of string, hence no ^ in the above regex. Also, as you used a regular string literal, \b in your pattern are backspace chars, not word boundary patterns.

If you need to match a string that has word char + whitespace(s) + word char anywhere in the string, replace re.match with re.search and you may even use r'\w\s+\w'. Or, if you really need to check word boundaries, r'\b\w+\s+\w+\b'.

Upvotes: 2

Related Questions