Reputation: 13
There is a Java Script page I am attempting to scrape with BeautifulSoup
bb2_addLoadEvent(function() {
for ( i=0; i < document.forms.length; i++ ) {
if (document.forms[i].method == 'post') {
var myElement = document.createElement('input');
myElement.setAttribute('type', 'hidden');
myElement.name = 'bb2_screener_';
myElement.value = '1568090530 122.44.202.205 122.44.202.205';
document.forms[i].appendChild(myElement);
}
I would like to obtain the value of "myElement.value", but I am not familiar with how to do so( If it is even possible with BeautifulSoup)
Ive tried :
soup = BeautifulSoup(a.text, 'html.parser')
h = soup.find('type') ...('div') ... ('input') ... even ('var')
print(soup)
and NO Luck :(
Is there a way of obtaining the value? If so how?
Upvotes: 1
Views: 87
Reputation: 84465
It would help to know more about the myElement.value across different pages. You might get away with a simple character set and lead string as shown in regex below. I would like to tighten it up but would need more examples ..... perhaps those number lengths are fixed and repeating ? ..... then something like p = re.compile(r"myElement\.value = '(\d{10}(?:(\s\d{3}\.\d{2}\.\d{3}\.\d{3}){2}))';")
<= then take group 1.
import re
s = '''bb2_addLoadEvent(function() {
for ( i=0; i < document.forms.length; i++ ) {
if (document.forms[i].method == 'post') {
var myElement = document.createElement('input');
myElement.setAttribute('type', 'hidden');
myElement.name = 'bb2_screener_';
myElement.value = '1568090530 122.44.202.205 122.44.202.205';
document.forms[i].appendChild(myElement);
}'''
p = re.compile(r"myElement\.value = '([\d\s\.]+)';")
print(p.findall(s)[0])
@SIM also kindly proposed:
p = re.compile(r"value[^']+'([^']*)'"
Upvotes: 2
Reputation: 370729
If myElement.value =
is static, this can be achieved with a simple regular expression:
value = re.compile(r"myElement\.value = '([^']+)'").search(str).group(1)
This matches myElement.value = '
, followed by non-'
characters, followed by another '
, where all the non-'
characters are captured in a group. Then the group(1)
extracts the group from the match.
If the string may contain escaped '
s as well, eg:
myElement.value = 'foo \' bar';
then alternate \.
with [^']
:
myElement\.value = '((?:\\.|[^'])+)'
https://regex101.com/r/Tdarel/1
Upvotes: 0