Reputation: 169
I am trying to scrape the information on chambers.com, more specifically in this example https://chambers.com/law-firm/allen-overy-llp-global-2:7. The information I want is the different departments and the bands that are under the "UK" section on the "Ranked Department" tab. Image of the section below:
The problem I'm currently having is with beautiful soup's find_all
, and I assume the parser. I want to find all <div class="mb-3">
The code I have so far is:
import requests
from bs4 import BeautifulSoup
url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-global-2:7'
plain_html_text = requests.get(url_to_scrape)
soup = BeautifulSoup(plain_html_text.content, "lxml")
search = soup.find_all("div", {"class": "mb-3"})
print(search)
and nothing is returned in the list. I have taken the class from the HTML using the inspector on my browser.
I have tried adding the HTML directly into the pyhton file, I have also tried using html.parser
but still nothing returned.
Any help would be much appreciaated, even if it a suggestion of where to look.
Upvotes: 0
Views: 639
Reputation: 12672
Check the source of the page, you will find there is no such an element in this page. Scrape the API:
import requests
url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
if location['description'] == 'UK':
for info in location['rankedEntities']:
print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")
Print:
Banking & Finance: Borrowers
Band 1
Banking & Finance: Lenders
Band 1
Banking & Finance: Sponsors
Band 2
Capital Markets: Debt
Band 1
Capital Markets: Derivatives
Band 1
Capital Markets: Equity
Band 1
Capital Markets: Securitisation
Band 1
Capital Markets: Structured Finance
Band 1
Competition Law
Band 2
Corporate M&A (International & Cross-Border)
Band 1
Dispute Resolution: International Arbitration
Band 2
Dispute Resolution: Litigation
Band 1
Disputes (International & Cross-Border)
Band 1
Employment
Band 2
Energy & Natural Resources: Oil & Gas
Band 1
Energy & Natural Resources: Power
Band 1
Energy & Natural Resources: Renewables & Alternative Energy
Band 1
Energy Sector (International & Cross-Border)
Band 1
Finance & Capital Markets (International & Cross-Border)
Band 1
Insurance: Mainly Policyholders
Band 1
Intellectual Property
Band 2
Intellectual Property: Patent Litigation
Band 1
Investigations & Enforcement (International & Cross-Border)
Band 2
Investment Funds & Asset Management (International & Cross-Border)
Band 2
Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2
Projects
Band 1
Restructuring/Insolvency
Band 1
Upvotes: 2
Reputation: 327
Instead of writing soup.find_all("div", {"class": "mb-3"})
use
soup.find_all("div", class_="mb-3"})
Upvotes: 0