oyerohabib
oyerohabib

Reputation: 395

How to select a specific element using BeautifulSoup

I am trying to scrape some profile informations on linkedIn. I came across an html structure with this kind of layout and need to select this "Abeokuta, Ogun State" only and disregard "Contract".

This is a page sample: https://www.linkedin.com/in/habibulah-oyero-44069a193/

html structure

<p class="pv-entity__secondary-title t-14 t-black t-normal">
      Abeokuta, Ogun State
      <span class="pv-entity__secondary-title separator">Contract</span>
</p>

python code

from bs4 import BeautifulSoup

src = browser.page_source
soup = BeautifulSoup(src, "lxml")

experience_div = soup.find("section", {"id": "experience-section"})

job_div = experience_div.find("div", {"class": "pv-entity__summary-info pv-entity__summary-info--background-section"})

job_location = job_div.find("p", {"class": "pv-entity__secondary-title"}).text.strip()

print(job_location)

This returns:

Abeokuta, Ogun State
        Contract

Upvotes: 1

Views: 2065

Answers (1)

MendelG
MendelG

Reputation: 20008

To only get the first tag, you can use the .find_next() method which will only return the first match:

from bs4 import BeautifulSoup


html = """<p class="pv-entity__secondary-title t-14 t-black t-normal">
      Abeokuta, Ogun State
      <span class="pv-entity__secondary-title separator">Contract</span>
</p>"""

soup = BeautifulSoup(html, "html.parser")

print(
    soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
    .find_next(text=True)
    .strip()
)

Or: You can use .contents:

print(
    soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
    .contents[0]
    .strip()
)

Output (in both solutions):

Abeokuta, Ogun State

Upvotes: 1

Related Questions