MrRBM97
MrRBM97

Reputation: 169

Web Scraping Div Class Not Found

I am trying to scrape the information on chambers.com, more specifically in this example https://chambers.com/law-firm/allen-overy-llp-global-2:7. The information I want is the different departments and the bands that are under the "UK" section on the "Ranked Department" tab. Image of the section below: enter image description here

The problem I'm currently having is with beautiful soup's find_all, and I assume the parser. I want to find all <div class="mb-3"> The code I have so far is:

import requests
from bs4 import BeautifulSoup
url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-global-2:7'

plain_html_text = requests.get(url_to_scrape)

soup = BeautifulSoup(plain_html_text.content, "lxml")

search = soup.find_all("div", {"class": "mb-3"})

print(search)

and nothing is returned in the list. I have taken the class from the HTML using the inspector on my browser.

I have tried adding the HTML directly into the pyhton file, I have also tried using html.parser but still nothing returned.

Any help would be much appreciaated, even if it a suggestion of where to look.

Upvotes: 0

Views: 639

Answers (2)

jizhihaoSAMA
jizhihaoSAMA

Reputation: 12672

Check the source of the page, you will find there is no such an element in this page. Scrape the API:

import requests

url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
    if location['description'] == 'UK':
        for info in location['rankedEntities']:
            print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")

Print:

Banking & Finance: Borrowers
Band 1

Banking & Finance: Lenders
Band 1

Banking & Finance: Sponsors
Band 2

Capital Markets: Debt
Band 1

Capital Markets: Derivatives
Band 1

Capital Markets: Equity
Band 1

Capital Markets: Securitisation
Band 1

Capital Markets: Structured Finance
Band 1

Competition Law
Band 2

Corporate M&A (International & Cross-Border)
Band 1

Dispute Resolution: International Arbitration
Band 2

Dispute Resolution: Litigation
Band 1

Disputes (International & Cross-Border)
Band 1

Employment
Band 2

Energy & Natural Resources: Oil & Gas
Band 1

Energy & Natural Resources: Power
Band 1

Energy & Natural Resources: Renewables & Alternative Energy
Band 1

Energy Sector (International & Cross-Border)
Band 1

Finance & Capital Markets (International & Cross-Border)
Band 1

Insurance: Mainly Policyholders
Band 1

Intellectual Property
Band 2

Intellectual Property: Patent Litigation
Band 1

Investigations & Enforcement (International & Cross-Border)
Band 2

Investment Funds & Asset Management (International & Cross-Border)
Band 2

Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2

Projects
Band 1

Restructuring/Insolvency
Band 1

Upvotes: 2

Nishith Savla
Nishith Savla

Reputation: 327

Instead of writing soup.find_all("div", {"class": "mb-3"}) use

soup.find_all("div", class_="mb-3"})

Upvotes: 0

Related Questions