Athena Wisdom
Athena Wisdom

Reputation: 6841

Python re.findall Not Matching JS Variables in HTML

I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall method.

However, I am having a little difficulty matching digits enclosed in " with \d*, and matching an alphanumeric string enclosed in " too.

Case 1:

s = """
   <script>
    var i = 1636592595;
        var j = i + Number("6876" + "52907");
   </script>
"""
pattern = r'var j = i + Number(\"(\d*)\" + \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []

The desired output should contain 6876 and 52907, but an empty list [] was obtained.

Case 2:

s = """
       xhr.send(JSON.stringify({
              "bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
              "pow": j
          }));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []

The desired output should contain AAQAAAAE/////4ytkgqq/oWI, but an empty list [] was obtained.

Can I have some help explaining why my regex patterns are not matching it?

Upvotes: 1

Views: 45

Answers (1)

Barmar
Barmar

Reputation: 781096

In the first regexp you need to escape +, (, and ).

In the second regexp, use [^"]* instead of \w*, since \w doesn't match punctuation like /.

import re

s = """
   <script>
    var i = 1636592595;
        var j = i + Number("6876" + "52907");
   </script>
"""
pattern = r'var j = i \+ Number\("(\d*)" \+ \"(\d*)\"\);'
m = re.findall(pattern, s)
print(m)

s = """
       xhr.send(JSON.stringify({
              "bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
              "pow": j
          }));
"""
pattern = r'"bm-foo": "([^"]*)",'
m = re.findall(pattern, s)
print(m)

DEMO

Upvotes: 2

Related Questions