Reputation: 59
Trying python 3.4 beautifulsoup to grab a zip file from a webpage so I can unzip and download it into a folder. I can get the beautifulsoup to print() all the hrefs on the page but I want a specific href ending in, "=Hospital_Revised_Flatfiles.zip". Is that possible? This is what I have so far, only the list of href from the url.
the full href of the file is, https://data.medicare.gov/views/bg9k-emty/files/Dlx5-ywq01dGnGrU09o_Cole23nv5qWeoYaL-OzSLSU?content_type=application%2Fzip%3B%20charset%3Dbinary&filename=Hospital_Revised_Flatfiles.zip , but the crazy stuff in the middle changes when they update the file and there is no way of knowing what it changes to.
Please let me know if there is something I left out of the question that might be helpful. I'm using Python 3.4 and BeautifulSoup4 (bs4)
from bs4 import BeautifulSoup
import requests
import re
url = "https://data.medicare.gov/data/hospital-compare"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Upvotes: 0
Views: 767
Reputation: 501
from BeautifulSoup import BeautifulSoup
import requests
import re
url = "https://data.medicare.gov/data/hospital-compare"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.findAll('a'):
if link.has_key('href'):
if(link['href'].endswith("=Hospital_Revised_Flatfiles.zip")):
print(link['href'])
Upvotes: 1