Reputation: 11
Currently writing a python automation script on a website. There are 50 to 100 images hosted in a cloud, and all images are structured like this:
<img style="width:80px;height:60px;"
src="http://someimagehostingsite.net/somefolder/some_random_url_with_timestamp">
The url doesn't have any suffix like .jpg
or .png
to get the info directly.
But I was able to make it by downloading the images one by one and getting the image file size. But I need to automate this process, by just accessing every url and getting the file size. Is it possible?
Upvotes: 1
Views: 1163
Reputation: 4855
If you are just trying to get the content length of a file by URL, you can do so by downloading only the HTTP headers and checking the Content-Length
field:
import requests
url='https://commons.wikimedia.org/wiki/File:Leptocorisa_chinensis_(20566589316).jpg'
http_response = requests.get(url)
print(f"Size of image {url} = {http_response.headers['Content-Length']} bytes")
However, if the image is compressed by the server before sending, the Content-Length
field will contain the compressed file size (the amount of data that will actually be downloaded) rather than the uncompressed image size.
To do this for all of the images on a given page, you could use the BeautifulSoup HTML processing library to extract a list of URLs for all of the images on the page and check the file size as follows:
from time import sleep
import requests
from bs4 import BeautifulSoup as Soup
url='https://en.wikipedia.org/wiki/Agent_Orange'
html = Soup(requests.get(url).text)
image_links = [(url + a['href']) for a in html.find_all('a', {'class': 'image'})]
for img_url in image_links:
response = requests.get(img_url)
try:
print(f"Size of image {img_url} = {response.headers['Content-Length']} bytes")
except KeyError:
print(f"Server didn't specify content length in headers for {img_url}")
sleep(0.5)
You'll have to adjust this to your specific problem, and might have to pass other parameters to soup.find_all()
to narrow it down to the specific images you're interested in, but something similar to this will achieve what you're trying to do.
Upvotes: 1
Reputation: 6377
You could try to see if you could send a HEAD request from the browser for each image. HTTP HEAD Request in Javascript/Ajax? It depends on if the HTTP server supports it properly. I also am not sure how you get the Content-Length header but that sounds like what you want.
Upvotes: 0