Reputation: 41
I am making an image scraper and want to be able to take some of these photos from this link and then save them in a folder named dribblephotos
: https://dribbble.com/search/shots/popular/illustration?q=sneaker%20
Here are the links I've retrieved:
https://static.dribbble.com/users/458522/screenshots/6040912/nike_air_huarache_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/105681/screenshots/3944640/hype_1x.png
https://static.dribbble.com/users/105681/avatars/mini/avatar-01-01.png?1377980605
https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg
https://static.dribbble.com/users/923409/avatars/mini/bc17b2db165c31804e1cbb1d4159462a.jpg?1596192494
https://static.dribbble.com/users/458522/screenshots/6034458/nike_air_jordan_i_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1237425/screenshots/5071294/customize_air_jordan_web_2x.png
https://static.dribbble.com/users/1237425/avatars/mini/87ae45ac7a07dd69fe59985dc51c7f0f.jpeg?1524130139
https://static.dribbble.com/users/1174720/screenshots/6187664/adidas_2x.png
https://static.dribbble.com/users/1174720/avatars/mini/9de08da40078e869f1a680d2e43cdb73.png?1588733495
https://static.dribbble.com/users/179617/screenshots/4426819/ultraboost_1x.png
https://static.dribbble.com/users/179617/avatars/mini/2d545dc6c0dffc930a2b20ca3be88802.jpg?1596735027
https://static.dribbble.com/users/458522/screenshots/6126041/nike_air_max_270_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/60266/screenshots/6698826/nike_shoe_2x.jpg
https://static.dribbble.com/users/60266/avatars/mini/64826d925db1d4178258d17d8826842b.png?1549028805
https://static.dribbble.com/users/78464/screenshots/4950025/8x600_1x.jpg
https://static.dribbble.com/users/78464/avatars/mini/a9ae6a559ab479d179e8bd22591e4028.jpg?1465908886
https://static.dribbble.com/users/458522/screenshots/6118702/adidas_nmd_r1_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/458522/screenshots/6098953/nike_lebron_10_je_icon_qs_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/7152093/img_0966_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6128979/nerd_x_adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/11064235/26fa4a2d-9033-4953-b48f-4c0e8a93fc9d_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6132938/nike_moon_racer_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1823684/screenshots/5973495/jordannn1_2x.png
https://static.dribbble.com/users/1823684/avatars/mini/f6041c082aec67302d4b78b8d203f02b.png?1509719582
https://static.dribbble.com/users/552027/screenshots/4666241/airmax270_1x.jpg
https://static.dribbble.com/users/552027/avatars/mini/35bb0dcb5a6619f68816290898bff6cc.jpg?1535884243
https://static.dribbble.com/users/458522/screenshots/6044426/adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/220914/screenshots/11295053/woman_shoe_tree_floating2_2x.png
https://static.dribbble.com/users/220914/avatars/mini/d364a9c166edb6d96cc059a836219a7d.jpg?1590773568
https://static.dribbble.com/users/4040486/screenshots/7079508/___2x.png
https://static.dribbble.com/users/4040486/avatars/mini/f31e9b50df877df815177e2015135ff7.png?1582521697
https://static.dribbble.com/users/57602/screenshots/12909636/d2_2x.png
https://static.dribbble.com/users/57602/avatars/mini/b4c27f3be2c61d82fbc821433d058b04.jpg?1575089000
https://static.dribbble.com/users/458522/screenshots/6049522/nike_x_john_elliott_lebron_10_soldier_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1025917/screenshots/9738550/vans-2020-pixelwolfie-dribbble_2x.png
https://static.dribbble.com/users/1025917/avatars/mini/87fdcb145eab0b47eda29fc873f25f8c.png?1594466719
https://static.dribbble.com/assets/icon-backtotop-1b04df73090f6b0f3192a3b71874ca3b3cc19dff16adc6cf365cd0c75897f6c0.png
https://static.dribbble.com/assets/dribbble-ball-icon-e94956d5f010d19607348176b0ae90def55d61871a43cb4bcb6d771d8d235471.svg
https://static.dribbble.com/assets/icon-shot-x-light-40c073cd65443c99d4ac129b69bf578c8cf97d69b78990c00c4f8c5873b0d601.png
https://static.dribbble.com/assets/icon-shot-prev-light-ca583c76838d54eca11832ebbcaba09ba8b2bf347de2335341d244ecb9734593.png
https://static.dribbble.com/assets/icon-shot-next-light-871a18220c4c5a0325d1353f8e4cc204c3b49beacc63500644556faf25ded617.png
https://static.dribbble.com/assets/dribbble-square-c8c7a278e96146ee5a9b60c3fa9eeba58d2e5063793e2fc5d32366e1b34559d3.png
https://static.dribbble.com/assets/dribbble-ball-192-ec064e49e6f63d9a5fa911518781bee0c90688d052a038f8876ef0824f65eaf2.png
https://static.dribbble.com/assets/icon-overlay-x-2x-b7df2526b4c26d4e8410a7c437c433908be0c7c8c3c3402c3e578af5c50cf5a5.png
However, I only want to be able to grab the URLs that have the string "screenshots" in them. So, I tried making a function to grab certain images that have the "screenshots" in its URL. so for example:
https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg
At first to see if even worked I made a function to print the specific links I wanted. However it didn't work. Here is my function code :
def art_links():
images = []
for img in x:
images.append(img['src'])
images = soup2.find_all("screenshots")
print(images)
Here is my full code:
from bs4 import BeautifulSoup
import requests as rq
import os
r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")
links = []
x = soup2.select('img[src^="https://static.dribbble.com"]')
for img in x:
links.append(img['src'])
def art_links():
images = []
for img in x:
images.append(img['src'])
images = soup2.find_all("screenshots")
print(images)
os.mkdir('dribblephotos')
for index, img_link in enumerate(links):
if "screenshots" in images:
img_data = r.get(img_link).content
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
else:
break
art_links()
Upvotes: 0
Views: 120
Reputation: 34
I'm noticing a little bit of an issue with the syntax of your code by the if statement at the end (not tabbed over under the if), so I reformatted it a bit to try and get it to what you wanted. I think what might be happening is you are breaking in an else statement out of the for loop you have at the end. This makes it so as soon as one entry doesn't have screenshot in the link, it stops the loop entirely instead of continuing. While there is a keyword 'continue' that can be used, it is sufficient to just not put the else statement. You also are checking for "screenshots" in images, but the name of the link that you are trying to check is declared as img_link in your for loop. Try this out for your for loop at the end and see what you get:
for index, img_link in enumerate(links):
if "screenshots" in img_link:
img_data = rq.get(img_link).content
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
If you still require the links rather than the file download, you should be able to retrieve them as you loop through the images in the for loop and store them in a new list if it was a screenshot link.
UPDATE: This newest one works for me. I removed the function that filters out the ips after putting them into a loop, since this was unnecessary after having already looped through it twice. The first for loop is all you need, iterating twice is unnecessary so I just check on the first time it is iterated through and only save the links to the links list if it is required.
from bs4 import BeautifulSoup
import requests as rq
import os
r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")
links = []
x = soup2.select('img[src^="https://static.dribbble.com"]')
os.mkdir('dribblephotos')
# Only one for loop required, shouldn't iterate twice if not required
for index, img in enumerate(x):
# Store the current url from the image result
url = img["src"]
# Check the url for screenshot before putting in the links
if "screenshot" in url:
links.append(img['src'])
# Download the image
img_data = rq.get(url).content
# Put the image into the file
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
print(links)
Upvotes: 1