Eyal S.
Eyal S.

Reputation: 1161

Reading an image url with beautifulsoup

I'm trying to read a picture from a website. This is my code so far:

from bs4 import BeautifulSoup
import requests

url = 'https://www.basketball-reference.com/players/h/hardeja01.html'
page_request = requests.get(url)
soup = BeautifulSoup(page_request.text,"lxml")
img_src = soup.find("div", {"class": "media-item"})
print img_src
# <div class="media-item"><img alt="Photo of James Harden" itemscope="image" src="https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg"/>\n</div>

I'm interested in the url of the jpg image. I can write some regular expression to get the jpg but there must be some easier way to do that.

What is the best way to extract the url of the jpg?

Upvotes: 0

Views: 119

Answers (3)

SIM
SIM

Reputation: 22440

You can do that in several ways. This as one of such approach:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.basketball-reference.com/players/h/hardeja01.html")
soup = BeautifulSoup(page.text, 'html.parser')
image = soup.find(itemscope="image")['src']
print(image)

Output:

https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg

Upvotes: 1

radzak
radzak

Reputation: 3118

You can use a select method that works with CSS selectors:

img_src = soup.select_one('.media-item > img')['src']

You can also try out Requests-HTML:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.basketball-reference.com/players/h/hardeja01.html')
>>> r.html.find('.media-item > img', first=True).attrs['src']
'https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg'

Upvotes: 1

Eyal S.
Eyal S.

Reputation: 1161

There is a very simple solution:

img_src = soup2.find("div", class_="media-item").find('img')['src']

Upvotes: 0

Related Questions