Reputation: 395
I am trying to scrape some profile informations on linkedIn. I came across an html structure with this kind of layout and need to select this "Abeokuta, Ogun State" only and disregard "Contract".
This is a page sample: https://www.linkedin.com/in/habibulah-oyero-44069a193/
html structure
<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>
python code
from bs4 import BeautifulSoup
src = browser.page_source
soup = BeautifulSoup(src, "lxml")
experience_div = soup.find("section", {"id": "experience-section"})
job_div = experience_div.find("div", {"class": "pv-entity__summary-info pv-entity__summary-info--background-section"})
job_location = job_div.find("p", {"class": "pv-entity__secondary-title"}).text.strip()
print(job_location)
This returns:
Abeokuta, Ogun State
Contract
Upvotes: 1
Views: 2065
Reputation: 20008
To only get the first tag, you can use the .find_next()
method which will only return the first match:
from bs4 import BeautifulSoup
html = """<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>"""
soup = BeautifulSoup(html, "html.parser")
print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.find_next(text=True)
.strip()
)
Or: You can use .contents
:
print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.contents[0]
.strip()
)
Output (in both solutions):
Abeokuta, Ogun State
Upvotes: 1