Jana.k
Jana.k

Reputation: 97

Scraping Deeply Nested HTML Tags, Python

I know that there are similar questions, but none of the answers worked for me. I am trying to get the src link from the img tag with class "ui mini avatar image" as you can see in the image below: nested tags

I did not want to go through 6 div tags to reach it, but both ways did not work. The search is returning NoneType. This is the code I am trying to use:

urls = df.iloc[:,-1].values  
for url in urls:
  soup = BeautifulSoup(url, 'lxml')
  a_tag = soup.find('a', class_ ='left')
  img = a_tag.find('img', class_= 'ui mini avatar image')
  link = img.get_attribute('src')
  print(link)
  break

Please note that I stored the links I needed in a csv file and I am using Pandas to read it. Here is the first link if you need to check it out: https://www.zomato.com/review/kMZEvAx

Upvotes: 0

Views: 351

Answers (1)

jizhihaoSAMA
jizhihaoSAMA

Reputation: 12672

Use css selector with select_one:

from bs4 import BeautifulSoup
import requests

headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}


soup = BeautifulSoup(requests.get("https://www.zomato.com/review/kMZEvAx", headers=headers).text)
print(soup.select_one(".ui.mini.avatar.image").get("src"))

Print:

'https://b.zmtcdn.com/images/placeholder_200.png'

  • Why did your solution doesn't work?

Your method find would only return the first <a> tag which class is left.

Upvotes: 2

Related Questions