RegEx for matching specific pattern in a Python list

Question

Say, I have the following code:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)'
    
for string in strings_of_text:
    # If the string is data#
    if (re.search(expression_to_use, string)):
        strings_to_keep.append(string)
print(strings_to_keep)

Where I am only concerned with adding strings with the pattern "data" followed by some number. So in this case, I would only want to add 'data0', 'data23', 'data2', 'data55'

How can I do this? I am thinking I will need to import re but I'm not sure how to use it.

I have read this: Python Regular Expression looking for two digits only

But when I try to modify my regular expression using this expression

^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)

It does not work... This is where I am stuck. I am new to using regular expressions so thank you to all of those who post in advance

EDIT:

Here is the outcome I am trying to get:

print(strings_to_keep)
>>> ['data0', 'data23', 'data2', 'data55']

The fourth bird · Accepted Answer

In your pattern you are using 4 alternations but you are not taking the word data into account.

You could use re.match instead to start the match from the beginning of the string and use data\d+$ to match data followed by 1+ digits until the end of the string:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

for string in strings_of_text:
    # If the string is data#
    if (re.match(expression_to_use, string)):
        strings_to_keep.append(string)

print(strings_to_keep)

Python demo

You might keep working with a filtered collection instead of creating a new one using for example filter:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

strings_of_text = list(filter(lambda x: re.match(expression_to_use, x), strings_of_text))
print(strings_of_text)

Result

['data0', 'data23', 'data2', 'data55']

Python demo

RegEx for matching specific pattern in a Python list

EDIT:

Answers (2)

Related Questions