Js doee
Js doee

Reputation: 333

How to get a particular script tag?

I am trying to build a download manager script with python, The web page contains some script tags, i want to isolate a particular script, the script html5player.setVideoUrlHigh('https://*****');,

I don't know how to go about it, I was able to get all the script tags but i am unable to get the script tag with this code html5player.setVideoUrlHigh('https://*****');

Here is my python code

from urllib.request import urlopen
import re
from bs4 import BeautifulSoup
Url = '*****'
pg = urlopen(Url)
sp = BeautifulSoup(pg)
script_tag = sp.find_all('script')
# print(script_tag[1])
print(re.search("setVideoHLS\(\'(.*?)\'\)", script_tag).group(1))

The script tag i want to get is this:

<script>
    logged_user = false;
    var static_id_cdn = 17;
    var html5player = new HTML5Player('html5video', '56420147');
    if (html5player) {
        html5player.setVideoTitle('passionate hotel room');
        html5player.setSponsors(false);
        html5player.setVideoUrlLow('https://*****');
        html5player.setVideoUrlHigh('https://******');
        html5player.setVideoHLS('https://****');
        html5player.setThumbUrl('https://**');
        html5player.setStaticDomain('***');
        html5player.setHttps();
        html5player.setCanUseHttps();
        document.getElementById('html5video').style.minHeight = '';
        html5player.initPlayer();
   }

How can I get parameter from this function `html5player.setVideoUrlHigh('https://******').

Upvotes: 0

Views: 270

Answers (1)

Ransaka Ravihara
Ransaka Ravihara

Reputation: 1994

You can get the script tag using this code,

import re
from bs4 import BeautifulSoup

html = """<script>    logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
    html5player.setVideoTitle('passionate hotel room');
    html5player.setSponsors(false);
    html5player.setVideoUrlLow('https://*****');
    html5player.setVideoUrlHigh('https://******');
    html5player.setVideoHLS(''https://****');
    html5player.setThumbUrl('https://**');
    html5player.setStaticDomain('***');
    html5player.setHttps();
    html5player.setCanUseHttps();
    document.getElementById('html5video').style.minHeight = '';
    html5player.initPlayer();
}</script>"""

soup = BeautifulSoup(HTML)

txt = soup.script.get_text()
print(txt)

Output:

logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
    html5player.setVideoTitle('passionate hotel room');
    html5player.setSponsors(false);
    html5player.setVideoUrlLow('https://*****');
    html5player.setVideoUrlHigh('https://******');
    html5player.setVideoHLS(''https://****');
    html5player.setThumbUrl('https://**');
    html5player.setStaticDomain('***');
    html5player.setHttps();
    html5player.setCanUseHttps();
    document.getElementById('html5video').style.minHeight = '';
    html5player.initPlayer();
   }

EDIT

import requests
import bs4
import re

url = 'url'
r = requests.get(url)
bs = bs4.BeautifulSoup(r.text, "html.parser")
scripts = bs.find_all('script')
src = scripts[7] #Needed script is in position 7
print(re.search("html5player.setVideoUrlHigh\(\'(.*?)\'\)", str(src)).group(1))

Upvotes: 1

Related Questions