Regex Not Matching Pattern from BeautifulSoup results

Question

I'm attempting to parse some HTML to look for a RegEx. When I use online tools to validate my regex expression, it works properly. It finds the value. However, when I use BeautifulSoup with RegEx the pattern fails to find the expression.

I am looking to grab this data: /some/path/to/file?accountTransactionID=f2448439-ec25-4a61-a6f4-4c6fa0767f19&accountNumber=123456&searchValue=ABC123&isActiveHistory=True

From this line:

 var url = '/some/path/to/file?accountTransactionID=f2448439-ec25-4a61-a6f4-4c6fa0767f19&accountNumber=123456&searchValue=ABC123&isActiveHistory=True'

In the below demo html.

Here is the Python script I'm working with. I have used several SO questions, including this one, but have not and any success.

If I use soup = BeautifulSoup(fp, 'html.parser').find_all(string=PATTERN) then the full text of the script has been stored in an array. I've tried looping through the array to find the text again, but it always comes up empty.

What have I done wrong?

Python:

FILE_PATH = os.getcwd() + '/demo.html'
PATTERN = re.compile('var url = \'(.*?)\'')

with open(FILE_PATH) as fp:
    soup = BeautifulSoup(fp, 'html.parser')  # .find_all(string=PATTERN)
    data = PATTERN.match(str(soup))
    print(f'Data: {data}')
    # for script in soup:
    #     print(script)
    #     data = PATTERN.match(str(script))
    #     if data is not None:
    #         print(f'Data: {data}')
    #     else:
    #         print('NO DATA FOUND')

Outputs: Data: None

HTML:

Regex Not Matching Pattern from BeautifulSoup results

Answers (1)

Related Questions