Reputation: 1161
I'm trying to read a picture from a website. This is my code so far:
from bs4 import BeautifulSoup
import requests
url = 'https://www.basketball-reference.com/players/h/hardeja01.html'
page_request = requests.get(url)
soup = BeautifulSoup(page_request.text,"lxml")
img_src = soup.find("div", {"class": "media-item"})
print img_src
# <div class="media-item"><img alt="Photo of James Harden" itemscope="image" src="https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg"/>\n</div>
I'm interested in the url of the jpg image. I can write some regular expression to get the jpg but there must be some easier way to do that.
What is the best way to extract the url of the jpg?
Upvotes: 0
Views: 119
Reputation: 22440
You can do that in several ways. This as one of such approach:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.basketball-reference.com/players/h/hardeja01.html")
soup = BeautifulSoup(page.text, 'html.parser')
image = soup.find(itemscope="image")['src']
print(image)
Output:
https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg
Upvotes: 1
Reputation: 3118
You can use a select
method that works with CSS selectors
:
img_src = soup.select_one('.media-item > img')['src']
You can also try out Requests-HTML
:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.basketball-reference.com/players/h/hardeja01.html')
>>> r.html.find('.media-item > img', first=True).attrs['src']
'https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg'
Upvotes: 1
Reputation: 1161
There is a very simple solution:
img_src = soup2.find("div", class_="media-item").find('img')['src']
Upvotes: 0