Reputation: 65
I am trying to access an element in
<script type="text/javascript">ReportPopper("http://asd.asd.asd/ReportOutput/asd-asd-41cc-asd-asd.xls");<script>
using beautifulsoup
, unfortunately I'm not sure on how to access the ReportPopper part and assign it to a variable using Python
Sorry if this is already answered. I've tried adding ReportPopper in the find('ReportPopper') and gives me a none element.
import requests
import io
import os
from bs4 import BeautifulSoup
participation = requests.post(url=report_post_url,data=request_post_report_form,headers=report_post_headers,stream=True)
print(participation)
soup = BeautifulSoup(participation.text, 'html.parser')
for n in soup.find_all('script'):
javascript = n['ReportPopper']
print(javascript)
I want to get my end result as:
javascript = "http://asd.asd.asd/ReportOutput/asd-asd-41cc-asd-asd.xls"
as my output:
Traceback (most recent call last):
File "c:\Users\John asd\Documents\GitHub\asd.net\testing.py", line 184, in <module>
javascript = n['ReportPopper']
File "C:\Users\John asd\asd\Local\Programs\Python\Python37\lib\site-packages\bs4\element.py", line 1016, in __getitem__
return self.attrs[key]
KeyError: 'ReportPopper'
Upvotes: 3
Views: 328
Reputation: 4315
re.compile() returns a regular expression object, which means h is a regex object.
The regex object has its own match method with the optional pos and endpos parameters:
regex.match(string[, pos[, endpos]])
from bs4 import BeautifulSoup
import re
html = """<script>ReportPopper("http://asd.asd.asd/ReportOutput/asd-asd-41cc-asd-asd.xls");</script>"""
soup = BeautifulSoup(html, 'lxml')
script = soup.find_all("script")
pattern = re.compile('ReportPopper(.*);')
for i in script:
strObj = i.text
match = pattern.search(strObj)
if match:
print(strObj.split("ReportPopper(")[1][:-2])
O/P:
"http://asd.asd.asd/ReportOutput/asd-asd-41cc-asd-asd.xls"
Upvotes: 1
Reputation: 84465
With bs4 4.7.1 you can use :contains if that string is present in the response
from bs4 import BeautifulSoup as bs
# r = requests.get(url)
# html - r.content
html = '<script type="text/javascript">ReportPopper("http://asd.asd.asd/ReportOutput/asd-asd-41cc-asd-asd.xls");<script>'
soup = bs(html, 'lxml')
s = soup.select_one('script:contains(ReportPopper)').text
url = s.split('"')[1]
print(url)
Upvotes: 1