Reputation: 15
I'm trying to use a web scrapper that gets images from a site.
However, my methods cause a severe drop in quality.
I'm trying to scrape the images from this site.
I have tried to download the image and use the image link I got from the href
tag. Both of these attempts cause a considerable drop in image quality. I am now considering taking a screenshot and cropping the image, however, I feel like this would be a roundabout way to do it. Does anyone know of a method to get around the quality drop? or any useful libraries maybe?
(edit) The easiest way seems to be to download the images themselves using a combination of request and web frameworks; might pursue this but I'm going to give it a little more time to stew. Does anyone know of a way to use the images without outright downloading them?
Upvotes: 1
Views: 531
Reputation: 15
Ended up following pineconee's advice, iterating through the images I wanted to get downloading them, applying them to where I wanted them to go and then deleting them using the OS module.
However there was an extra caveat to my problem which required me to step out of the Selenium module because the file selection was local and not through the browser, solved that by using the pyautogui. The time.sleeps() seem to be a must, if the automation goes too fast the file doesn't seem to catch and gets stuck on the file explorer.
def loadImages(driver, images):
try:
for image in images:
filename = image.split('/')[-1].split('.')[0] + ".jpg"
imgData = requests.get(image).content
with open(filename, "wb") as f:
f.write(imgData)
filepath = os.path.abspath(filename)
boxx = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='dn_ht']")))
boxx.click()
time.sleep(1)
pyautogui.write(filepath)
time.sleep(1)
pyautogui.press('enter')
time.sleep(5)
os.remove(filename)
except Exception as e:
print(e)
Upvotes: 0
Reputation: 16
Manually getting the data through requests and writing it to a file of the same extension should work.
import requests
url = "https://img.ltwebstatic.com/images3_pi/2023/04/22/1682132328cfa167169f129c340da4fc854d5587b4_thumbnail_600x.jpg"
img_data = requests.get(url).content
with open('image.jpg', 'wb') as f:
f.write(img_data)
Upvotes: 0