Kyle DeGennaro
Kyle DeGennaro

Reputation: 198

RegEx for matching specific pattern in a Python list

Say, I have the following code:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)'
    
for string in strings_of_text:
    # If the string is data#
    if (re.search(expression_to_use, string)):
        strings_to_keep.append(string)
print(strings_to_keep)

Where I am only concerned with adding strings with the pattern "data" followed by some number. So in this case, I would only want to add 'data0', 'data23', 'data2', 'data55'

How can I do this? I am thinking I will need to import re but I'm not sure how to use it.

I have read this: Python Regular Expression looking for two digits only

But when I try to modify my regular expression using this expression

^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)

It does not work... This is where I am stuck. I am new to using regular expressions so thank you to all of those who post in advance

EDIT:

Here is the outcome I am trying to get:

print(strings_to_keep)
>>> ['data0', 'data23', 'data2', 'data55']

Upvotes: 2

Views: 9811

Answers (2)

The fourth bird
The fourth bird

Reputation: 163632

In your pattern you are using 4 alternations but you are not taking the word data into account.

You could use re.match instead to start the match from the beginning of the string and use data\d+$ to match data followed by 1+ digits until the end of the string:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

for string in strings_of_text:
    # If the string is data#
    if (re.match(expression_to_use, string)):
        strings_to_keep.append(string)

print(strings_to_keep)

Python demo

You might keep working with a filtered collection instead of creating a new one using for example filter:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

strings_of_text = list(filter(lambda x: re.match(expression_to_use, x), strings_of_text))
print(strings_of_text)

Result

['data0', 'data23', 'data2', 'data55']

Python demo

Upvotes: 1

You should use re.compile if you are using the same pattern as it has less overhead.

strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']

import re
engine = re.compile(r'data\d+$')
strings_to_keep = [s for s in strings_of_text if engine.match(s)]
print(strings_to_keep) # ['data0', 'data23', 'data2', 'data55']

Upvotes: 0

Related Questions