DBS
DBS

Reputation: 1147

Python regex statement not returning correct results

My regex syntax is not returning the correct results. I have data returned from GitHub using the github3.py library that returns three possible strings when parsing through the patch key of md files (https://developer.github.com/v3/pulls/#list-pull-requests-files). I've read the regex documentation and several threads, but I'm missing something in my syntax.

string1 = '> [HELP.SELECTOR]'
string2 = '-> [HELP.SELECTOR]'
string3 = '+> [HELP.SELECTOR]'

I want to print True for the exact match to string2 or string3 and False if string1 is found. My results are returning False if string2 or string3 is found.

for prs in repo.pull_requests():
    search_string_found = 'False'
    regex_search_string1 = re.compile(r"^\+>\s\[HELP.SELECTOR\]")
    regex_search_string2 = re.compile(r"^->\s\[HELP.SELECTOR\]")
    for data in repo.pull_request(prs.number).files():
        match_text1 = regex_search_string1.search(data.patch)
        match_text2 = regex_search_string2.search(data.patch)                        
        if match_text1 is not None and match_text2 is not None:
            search_string_found = 'True'
            break

    print('HELP.SELECTOR present in file: ', search_string_found)

Upvotes: 1

Views: 69

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

Since you confirm your strings may be not located at the string start, you need

regex_search_string = re.compile(r"[+-]>\s\[HELP\.SELECTOR\]")
for data in repo.pull_request(prs.number).files():
    match_text = regex_search_string.search(data.patch)
    if match_text:
        search_string_found = 'True'
        break

Note:

  • [+-] matches either a + or a - since it is a character class that matches a single character from a range/set specified inside it
  • + inside [...] does not have to be escaped ever
  • - at the start or end of [...] does not have to be escaped
  • re.search returns a match data object or None, you need to check it first before accessing the text matched/captured

Upvotes: 1

Robᵩ
Robᵩ

Reputation: 168626

It is easier to maintain one regex string than several. Try this:

import re

strings = [
     '> [HELP.SELECTOR]$',
     '-> [HELP.SELECTOR]$',
     '+> [HELP.SELECTOR]$',
]

for string in strings:
    print (bool(re.match(r'[-+]> \[HELP.SELECTOR\]$', string)), string)

Result:

False > [HELP.SELECTOR]
True -> [HELP.SELECTOR]
True +> [HELP.SELECTOR]

Applying that to your problem,

#UNTESTED
for prs in repo.pull_requests():
    search_string_found = any(
        re.match(r'[-+]> \[HELP.SELECTOR\]', data.patch)
        for data in repo.pull_request(prs.number).files())
    print('HELP.SELECTOR present in file: ', search_string_found)

Upvotes: 0

Related Questions