Reputation: 7451
I'm using BeautifulSoup
in my python code to download an image from a website which changes regularly. It all works well.
However, on the page (https://apod.nasa.gov/apod/astropix.html) there is one lower resolution image (which my code currently downloads) but then if you click the image it takes you to a higher resolution version of that same image.
Can someone please suggest how I can change my code so that it downloads the higher resolution image?:
from bs4 import BeautifulSoup as BSHTML
import requests
import subprocess
import urllib2
page = urllib2.urlopen('https://apod.nasa.gov/apod/astropix.html')
soup = BSHTML(page,features="html.parser")
images = soup.findAll('img')
url = 'https://apod.nasa.gov/apod/'+images[0]['src']
r = requests.get(url, allow_redirects=True)
with open('/home/me/Downloads/apod.jpg',"w") as f:
f.write(r.content)
Upvotes: 2
Views: 245
Reputation: 908
You need to download and write to disk:
import requests
from os.path import basename
r = requests.get("xxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
lnk = link.get('src')
with open(basename(lnk), "wb") as f:
f.write(requests.get(lnk).content)
You can also use a select to filter your tags to only get the ones with http links:
for link in soup.select("img[src^=http]"):
lnk = link["src"]
with open(basename(lnk)," wb") as f:
f.write(requests.get(lnk).content)
Upvotes: 1
Reputation: 195438
You can select the <a>
tag that contains <img>
and then "href"
attribute contains your image URL:
import requests
from bs4 import BeautifulSoup as BSHTML
page = requests.get("https://apod.nasa.gov/apod/astropix.html")
soup = BSHTML(page.content, features="html.parser")
image_url = (
"https://apod.nasa.gov/apod/" + soup.select_one("a:has(>img)")["href"]
)
r = requests.get(image_url, allow_redirects=True)
with open("/home/paul/Downloads/apod.jpg", "wb") as f:
f.write(r.content)
Upvotes: 2