Reputation: 6841
I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall
method.
However, I am having a little difficulty matching digits enclosed in "
with \d*
, and matching an alphanumeric string enclosed in "
too.
Case 1:
s = """
<script>
var i = 1636592595;
var j = i + Number("6876" + "52907");
</script>
"""
pattern = r'var j = i + Number(\"(\d*)\" + \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain 6876
and 52907
, but an empty list []
was obtained.
Case 2:
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain AAQAAAAE/////4ytkgqq/oWI
, but an empty list []
was obtained.
Can I have some help explaining why my regex patterns are not matching it?
Upvotes: 1
Views: 45
Reputation: 781096
In the first regexp you need to escape +
, (
, and )
.
In the second regexp, use [^"]*
instead of \w*
, since \w
doesn't match punctuation like /
.
import re
s = """
<script>
var i = 1636592595;
var j = i + Number("6876" + "52907");
</script>
"""
pattern = r'var j = i \+ Number\("(\d*)" \+ \"(\d*)\"\);'
m = re.findall(pattern, s)
print(m)
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": "([^"]*)",'
m = re.findall(pattern, s)
print(m)
Upvotes: 2