Python(BeautifulSoup) - Get href from

Question

I'm working on "Video Downloader" and I have one problem with BeautifulSoup4.

Here is part of html which from I want to get a href:

And here is href which I want to print:

href="http://s896.vshare.io/download,9999999999999999999999999999999999999999-f6192405453bf5ff3cfe41a488d8390d,5944ed28,4d948c5.avi"

I was trying with this, but it's not working.

for a in soup3.find_all('a'):
    if 'href' in a.attrs:
        print(a['href'])

Szymon · Accepted Answer

Beautiful Soup can parse HTML and XML, not JavaScript. You can use regular expression to search this code.
Using ]*?(href=\"([^\">]+)\") you can match everything inside this code which:

- is an a tag


[^>]*? - can have any characters that are not > 
href=" - have href
[^\">]+ - have any number of characters other than " and >

To extract script code from html you can use

script = soup.find('script', {'type': 'text/javascript'})

and then to parse it, use

re.search(r"]*?(href=\"([^\">]+)\")", script.text)

Remember to import re first.


print(re.search(r"]*?(href=\"([^\">]+)\")", script.text)[1])
# href="http://s896.vshare.io/download,9999999999999999999999999999999999999999-f6192405453bf5ff3cfe41a488d8390d,5944ed28,4d948c5.avi
print(re.search(r"]*?(href=\"([^\">]+)\")", script.text)[2])
# http://s896.vshare.io/download,9999999999999999999999999999999999999999-f6192405453bf5ff3cfe41a488d8390d,5944ed28,4d948c5.avi


Read about regular expression. If you are going to use pattern often, compile it first.

https://docs.python.org/3/library/re.html

Python(BeautifulSoup) - Get href from <script>

Answers (1)

Related Questions

Python(BeautifulSoup) - Get href from &lt;script&gt;

Answers (1)

Related Questions

Python(BeautifulSoup) - Get href from <script>