how to return the image with the largest dimension

Question

I have been able to filter all the image url from a page and displayed them one after the other

import requests
from bs4 import BeautifulSoup


article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"
response = requests.get(article_URL)
soup = bs4.BeautifulSoup(response.text,'html.parser')
images = soup.find('body').find_all('img')
i = 0
image_url = []
for im in images:
    print(im)
    i+=1
    url = im.get('src')
    image_url.append(url)
    print('Downloading: ', url) 
    try:
        response = requests.get(url, stream=True)
        with open(str(i) + '.jpg', 'wb') as out_file:
            shutil.copyfileobj(response.raw, out_file)
            del response
    except:
        print('Could not download: ', url)

new = [x for x in image_url if x is not None]
for url in new:
    resp = requests.get(url, stream=True).raw
    image = np.asarray(bytearray(resp.read()), dtype="uint8")
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)
#     height, width, channels = image.shape
    height, width, _ = image.shape
    dimension = []
    for items in height, width:
        dimension.append(items)
#     print(height, width)
    print(dimension)

I want to print the image with the largest dimension from the list of url

This is the result I have from the list which is not good enough

[72, 72]
[95, 96]
[13, 60]
[227, 973]
[17, 60]
[229, 771]

furas · Accepted Answer

I see two problems.

you create dimention = [] inside loop so it removes previous value. You have to create dimention = [] before loop and inside loop use
```
dimension.append( (width, height) )
```
and after loop you can use max(dimension) to get pair with max width
you keep only width, height in dimension so you don't know which file has this dimention. You should keep all information
```
dimension.append( (width, height, url, filename) ) 
```

My version.

I use dictionary data to keep all information

data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

and later I use key in max() to get item with max width

max(data, key=lambda x:x['width'])

but the same way I could use x['height'] or x['width'] * x['height']

import requests
from bs4 import BeautifulSoup
import shutil
import cv2

article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"

response = requests.get(article_URL)
soup = BeautifulSoup(response.text, 'html.parser')
images = soup.find('body').find_all('img')

# --- loop --- 

data = []
i = 0

for img in images:
    print('HTML:', img)
    
    url = img.get('src')

    if url:  # skip `url` with `None`
        print('Downloading:', url) 
        try:
            response = requests.get(url, stream=True)

            i += 1
            url = url.rsplit('?', 1)[0]  # remove ?opt=20 after filename
            ext = url.rsplit('.', 1)[-1] # .png, .jpg, .jpeg
            filename = f'{i}.{ext}' 
            print('Filename:', filename)

            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)

            image = cv2.imread(filename)
            height, width = image.shape[:2]

            data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

        except Exception as ex:
            print('Could not download: ', url)
            print('Exception:', ex)

    print('---')

# --- after loop ---

print('max:', max(data, key=lambda x:x['width']))

all_sorted = sorted(data, key=lambda x:x['width'], reverse=True)

print('Top 3:', all_sorted[:3])
# or
for item in all_sorted[:3]:
    print(item['width'], item['url'])

BTW: to get images only with src

 .find_all('img', {'src': True})

how to return the image with the largest dimension

Answers (2)

Related Questions