Reputation: 33
when I try to pull a particular href for a .zip file, all that is returned is #
I have stripped my script down so only the tricky part is left. When I run the script with the test html ( which is a copy of target site ) without using my opener.open it works fine. When I run it on the actual site, I only receive #.
Any help would be very much appreciated.
#!usr/bin/env python3
from bs4 import BeautifulSoup
import urllib.request
class Opener(urllib.request.FancyURLopener):
version = "Mozilla/5.0"
opener = Opener()
test = '<a id="dlbutton" href="https://www55.zippyshare.com/d/H7prSkjz/2176/Barrier%20Line%20Riddim%20-%20%20J.%20Small%20Records.zip"><div class="download"></div></a>'
dstar = 'https://www55.zippyshare.com/v/H7prSkjz/file.html'
def grabzip(url):
link = BeautifulSoup(opener.open(url), "html.parser")
for ziplink in link.find_all('a', id="dlbutton"):
print(ziplink.get('href'))
grabzip(dstar)
Upvotes: 0
Views: 71
Reputation: 57175
You can use Selenium/Chromedriver to navigate the site dynamically and grab the href:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("https://www55.zippyshare.com/v/H7prSkjz/file.html")
print(driver.find_element_by_id("dlbutton").get_attribute("href"))
https://www55.zippyshare.com/d/H7prSkjz/16761/Barrier%20Line%20Riddim%20-%20%20J.%20Small%20Records.zip
Upvotes: 1