Michael Tsu
Michael Tsu

Reputation: 59

BeautifulSoup with dynamic href

Trying python 3.4 beautifulsoup to grab a zip file from a webpage so I can unzip and download it into a folder. I can get the beautifulsoup to print() all the hrefs on the page but I want a specific href ending in, "=Hospital_Revised_Flatfiles.zip". Is that possible? This is what I have so far, only the list of href from the url.

the full href of the file is, https://data.medicare.gov/views/bg9k-emty/files/Dlx5-ywq01dGnGrU09o_Cole23nv5qWeoYaL-OzSLSU?content_type=application%2Fzip%3B%20charset%3Dbinary&filename=Hospital_Revised_Flatfiles.zip , but the crazy stuff in the middle changes when they update the file and there is no way of knowing what it changes to.

Please let me know if there is something I left out of the question that might be helpful. I'm using Python 3.4 and BeautifulSoup4 (bs4)

from bs4 import BeautifulSoup 
import requests
import re

url = "https://data.medicare.gov/data/hospital-compare"

r = requests.get(url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))

Upvotes: 0

Views: 767

Answers (1)

WhoAmI
WhoAmI

Reputation: 501

from BeautifulSoup import BeautifulSoup 
import requests
import re

url = "https://data.medicare.gov/data/hospital-compare"

r = requests.get(url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.findAll('a'):
   if link.has_key('href'):
      if(link['href'].endswith("=Hospital_Revised_Flatfiles.zip")):
         print(link['href'])

Upvotes: 1

Related Questions