Links from BeautifulSoup without href or

Question

I am trying to create a bot that scrapes all the image links from a site and store them somewhere else so I can download the images after.

from selenium import webdriver
import time
from bs4 import BeautifulSoup as bs  
import requests

url = 'https://www.artstation.com/artwork?sorting=trending'
page = requests.get(url)
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
soup = bs(driver.page_source, 'html.parser')
gallery =  soup.find_all(class_="image-src")
data = gallery[0]
for x in range(len(gallery)):
    print("TAG:", sep="
")
    print(gallery[x], sep="
")

if page.status_code == 200:  
    print("Request OK")

This returns all the links tags i wanted but I can't find a way to remove the html or copy only the links to a new list. Here is an example of the tag i get:

So, how do i get only the links within the gallery[] list? What i want to do after is to take this links and edit the /smaller-square/ directory to /large/, which is the one that has the high resolution image.

Andrej Kesely · Accepted Answer

The page loads it's data through AJAX, so through network inspector we see, where the call is made. This snippet will obtain all the image links found on page 1, sorted by trending:

import requests
import json

url = 'https://www.artstation.com/projects.json?page=1&sorting=trending'
page = requests.get(url)
json_data = json.loads(page.text)

for data in json_data['data']:
    print(data['cover']['medium_image_url'])

Prints:

https://cdna.artstation.com/p/assets/images/images/012/272/796/medium/ben-zhang-brigitte-hero-concept.jpg?1533921480
https://cdna.artstation.com/p/assets/covers/images/012/279/572/medium/ham-sung-choul-braveking-140823-1-3-s3-mini.jpg?1533959982
https://cdnb.artstation.com/p/assets/covers/images/012/275/963/medium/michael-vicente-orb-gem-thumb.jpg?1533933774
https://cdnb.artstation.com/p/assets/images/images/012/275/635/medium/michael-kutsche-piglet-by-michael-kutsche.jpg?1533932387
https://cdna.artstation.com/p/assets/images/images/012/273/384/medium/ben-zhang-unnamed.jpg?1533923353
https://cdnb.artstation.com/p/assets/covers/images/012/273/083/medium/michael-vicente-orb-guardian-thumb.jpg?1533922229

... and so on.

If you print the variable json_data, you will see other information the page sends (like icon image url, total_count, data about the author etc.)

Links from BeautifulSoup without href or <a>

Answers (2)

Related Questions

Links from BeautifulSoup without href or &lt;a&gt;

Answers (2)

Related Questions

Links from BeautifulSoup without href or <a>