Reputation: 9
I made a script for my friend in python(I lost the bet),which download all of the thumbnail images(about 50 imgs,one img size is 20 kB) by data-thumb_url tag in which are urls.
Can this code can break the website or affect on it badly(I mean DDOS or smth like that)?I used it few times for 10,20,30 imgs and it works perfectly,and website works normal too(it is very popular website,one of the most in the world and it wasn't said that webscraping is illegal in this website),but I need to know if it's safe code.
from PIL import Image
from bs4 import BeautifulSoup
import requests
import os
url = '' #(here is the url of website)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
images = soup.find_all('img')
listt = []
for i in images:
try:
listt.append(i['data-thumb_url'])
except KeyError:
pass
for i in range(len(listt)):
img = Image.open(requests.get(listt[i], stream = True).raw)
img.save("image"+str(i)+".jpg")
I know that it's a little bit silly question considering 80-100 millions of website views per day,and for example free extensions/websites/programs to download images from website,but I'm new in bs and requests in Python + I'm anxious.
Upvotes: 0
Views: 159
Reputation: 555
If you are accessing multiple urls, even with the sleep, the site might have other security measures that you might trigger (prove you are a human). This might cause your script to fail when you try accessing other pages...
Without seeing the site you are hitting and the number of pages, it is hard to say for certain. But Cargo23 is right, as it stands now,you wont be breaking the site anytime soon.
Upvotes: 1
Reputation: 3189
Firstly, in the code you provided, you the list of URLs as listt
in most places, but you call it lista
when appending.
Secondly, no, your code isn't going to break a website. Because you are just running a Python in a single thread, it will only make 1 request at a time. If you wanted to be super cautious, you can add a time.sleep
inside your last for
loop, but that isn't really necessary.
Upvotes: 1