jdgalaway
jdgalaway

Reputation: 43

Can't Get the Desired Text in Beautifulsoup

Sorry if the below formatting is incorrect. I'm trying to scrape just the "Jane Doe" section of the below html

<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>

My code at the bottom will output both "Jane Doe" and insurance claim. How can I just get the "Jane Doe" text? Thank you in advance for your help.

soup = BeautifulSoup(page.content, 'html.parser')
listings = soup.find(id="listings")
listing_items = listings.find_all(class_="col1 client")

Upvotes: 2

Views: 295

Answers (2)

MITHU
MITHU

Reputation: 164

Another usage might be the following:

from bs4 import BeautifulSoup

htmldocs = """
<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>
"""
soup = BeautifulSoup(htmldocs, 'html5lib')
for item in soup.select(".request"):
    print(item.previous_sibling.strip())

Upvotes: 0

QHarr
QHarr

Reputation: 84475

You want to use next_sibling

from bs4 import BeautifulSoup

html = '''
<div class="col1 client">
   <a name="12345"></a>
   "Jane Doe"
   <div class="request"><i>insurance claim</i></div>        
</div>
'''

soup = BeautifulSoup(html, 'lxml')
for item in soup.select(".col1.client a"):
    print(item.next_sibling)

Or

print([item.next_sibling.strip() for item in soup.select(".col1.client a")])

Upvotes: 1

Related Questions