user1551817
user1551817

Reputation: 7451

Download an image using BeautifulSoup

I'm using BeautifulSoup in my python code to download an image from a website which changes regularly. It all works well.

However, on the page (https://apod.nasa.gov/apod/astropix.html) there is one lower resolution image (which my code currently downloads) but then if you click the image it takes you to a higher resolution version of that same image.

Can someone please suggest how I can change my code so that it downloads the higher resolution image?:

from bs4 import BeautifulSoup as BSHTML
import requests
import subprocess
import urllib2
page = urllib2.urlopen('https://apod.nasa.gov/apod/astropix.html')
soup = BSHTML(page,features="html.parser")
images = soup.findAll('img')

url = 'https://apod.nasa.gov/apod/'+images[0]['src']
r = requests.get(url, allow_redirects=True)
with open('/home/me/Downloads/apod.jpg',"w") as f:
            f.write(r.content)

Upvotes: 2

Views: 245

Answers (2)

jahantaila
jahantaila

Reputation: 908

You need to download and write to disk:

import requests
from os.path  import basename

r = requests.get("xxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
        lnk = link.get('src')
        with open(basename(lnk), "wb") as f:
            f.write(requests.get(lnk).content)

You can also use a select to filter your tags to only get the ones with http links:

for link in soup.select("img[src^=http]"):
        lnk = link["src"]
        with open(basename(lnk)," wb") as f:
            f.write(requests.get(lnk).content)

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can select the <a> tag that contains <img> and then "href" attribute contains your image URL:

import requests
from bs4 import BeautifulSoup as BSHTML

page = requests.get("https://apod.nasa.gov/apod/astropix.html")
soup = BSHTML(page.content, features="html.parser")

image_url = (
    "https://apod.nasa.gov/apod/" + soup.select_one("a:has(>img)")["href"]
)

r = requests.get(image_url, allow_redirects=True)
with open("/home/paul/Downloads/apod.jpg", "wb") as f:
    f.write(r.content)

Upvotes: 2

Related Questions