Hardik Manek
Hardik Manek

Reputation: 25

Modifying the url parameter to download images from multiple web-sites

I was trying to download images from all the cases included in CaseIDs array, but it doesn't work. I want code to run for all cases.

from bs4 import BeautifulSoup
import requests as rq
from urllib.parse import urljoin
from tqdm import tqdm

CaseIDs = [100237, 99817, 100271]

with rq.session() as s:
    for caseid in tqdm(CaseIDs):
        url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID= {caseid}'
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        url = urljoin(url, soup.find('a', text='Text and Images Only')['href'])
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        links = [urljoin(url, i['src']) for i in soup.select('img[src^="GetBinary.aspx"]')]

        count = 0
        for link in links:
            content = s.get(link).content
            with open("test_image" + str(count) + ".jpg", 'wb') as f:
                f.write(content)
            count += 1

Upvotes: 1

Views: 63

Answers (2)

Kingindanord
Kingindanord

Reputation: 2036

try use format() like this:

url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID={}'.format(caseid)

Upvotes: 2

Oliver.R
Oliver.R

Reputation: 1368

You need to use an f-string to pass your caseId value in, as you're trying to do:

url = f'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID= {caseid}'

(You probably also need to remove the space between the = and the {)

Upvotes: 2

Related Questions