Scraping Deeply Nested HTML Tags, Python

Question

I know that there are similar questions, but none of the answers worked for me. I am trying to get the src link from the img tag with class "ui mini avatar image" as you can see in the image below:

I did not want to go through 6 div tags to reach it, but both ways did not work. The search is returning NoneType. This is the code I am trying to use:

urls = df.iloc[:,-1].values  
for url in urls:
  soup = BeautifulSoup(url, 'lxml')
  a_tag = soup.find('a', class_ ='left')
  img = a_tag.find('img', class_= 'ui mini avatar image')
  link = img.get_attribute('src')
  print(link)
  break

Please note that I stored the links I needed in a csv file and I am using Pandas to read it. Here is the first link if you need to check it out: https://www.zomato.com/review/kMZEvAx

jizhihaoSAMA · Accepted Answer

Use css selector with select_one:

from bs4 import BeautifulSoup
import requests

headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}


soup = BeautifulSoup(requests.get("https://www.zomato.com/review/kMZEvAx", headers=headers).text)
print(soup.select_one(".ui.mini.avatar.image").get("src"))

Print:

'https://b.zmtcdn.com/images/placeholder_200.png'

Why did your solution doesn't work?

Your method find would only return the first tag which class is left.

Scraping Deeply Nested HTML Tags, Python

Answers (1)

Related Questions