Reputation: 333
I am trying to build a download manager script with python, The web page contains some script tags, i want to isolate a particular script, the script html5player.setVideoUrlHigh('https://*****');
,
I don't know how to go about it, I was able to get all the script tags but i am unable to get the script tag with this code html5player.setVideoUrlHigh('https://*****');
Here is my python code
from urllib.request import urlopen
import re
from bs4 import BeautifulSoup
Url = '*****'
pg = urlopen(Url)
sp = BeautifulSoup(pg)
script_tag = sp.find_all('script')
# print(script_tag[1])
print(re.search("setVideoHLS\(\'(.*?)\'\)", script_tag).group(1))
The script tag i want to get is this:
<script>
logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
html5player.setVideoTitle('passionate hotel room');
html5player.setSponsors(false);
html5player.setVideoUrlLow('https://*****');
html5player.setVideoUrlHigh('https://******');
html5player.setVideoHLS('https://****');
html5player.setThumbUrl('https://**');
html5player.setStaticDomain('***');
html5player.setHttps();
html5player.setCanUseHttps();
document.getElementById('html5video').style.minHeight = '';
html5player.initPlayer();
}
How can I get parameter from this function `html5player.setVideoUrlHigh('https://******').
Upvotes: 0
Views: 270
Reputation: 1994
You can get the script tag using this code,
import re
from bs4 import BeautifulSoup
html = """<script> logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
html5player.setVideoTitle('passionate hotel room');
html5player.setSponsors(false);
html5player.setVideoUrlLow('https://*****');
html5player.setVideoUrlHigh('https://******');
html5player.setVideoHLS(''https://****');
html5player.setThumbUrl('https://**');
html5player.setStaticDomain('***');
html5player.setHttps();
html5player.setCanUseHttps();
document.getElementById('html5video').style.minHeight = '';
html5player.initPlayer();
}</script>"""
soup = BeautifulSoup(HTML)
txt = soup.script.get_text()
print(txt)
Output:
logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
html5player.setVideoTitle('passionate hotel room');
html5player.setSponsors(false);
html5player.setVideoUrlLow('https://*****');
html5player.setVideoUrlHigh('https://******');
html5player.setVideoHLS(''https://****');
html5player.setThumbUrl('https://**');
html5player.setStaticDomain('***');
html5player.setHttps();
html5player.setCanUseHttps();
document.getElementById('html5video').style.minHeight = '';
html5player.initPlayer();
}
import requests
import bs4
import re
url = 'url'
r = requests.get(url)
bs = bs4.BeautifulSoup(r.text, "html.parser")
scripts = bs.find_all('script')
src = scripts[7] #Needed script is in position 7
print(re.search("html5player.setVideoUrlHigh\(\'(.*?)\'\)", str(src)).group(1))
Upvotes: 1