sotokan80
sotokan80

Reputation: 59

Get text from a div using beautifulsoup4

I want to extract from the following html code only the placename using python and bs4.

<div class="results-list" id="theaterlist">
 <table>
  <tr class="trspacer">
   <td>
    <a href="theater.aspx?id=4000642">
     <h2 class="placename">
      Hyde Park
      <span class="boldelement">
      Richmond Avenue 56 ls61bz
      </span>
     </h2>
    </a>

I m using the following code but i get the address too.

mydivs = soup.find("div", {"id": "theaterlist"})
lis = mydivs.select("a[href*=theater.aspx]")
for x in lis:
    theater = x.find('h2', class_='placename')
    print theater.text

Any help would be appreciated.

Upvotes: 1

Views: 625

Answers (3)

SIM
SIM

Reputation: 22440

Try this:

for x in soup.select("a[href*=theater.aspx]"):
    theater = x.find('h2', class_='placename')
    print(theater.contents[0].strip())

Upvotes: 0

Sunitha
Sunitha

Reputation: 12015

soup.find("div", {"id": "theaterlist"}).find('h2', class_='placename').text.strip()
# 'Hyde Park\n      \n      Richmond Avenue 56 ls61bz'

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195438

For getting the text only for the element (not child elements) you can use .find(text=True):

data = """
<div class="results-list" id="theaterlist">
 <table>
  <tr class="trspacer">
   <td>
    <a href="theater.aspx?id=4000642">
     <h2 class="placename">
      Hyde Park
      <span class="boldelement">
      Richmond Avenue 56 ls61bz
      </span>
     </h2>
    </a>
"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')
print(soup.find('h2').find(text=True).strip())

Prints:

Hyde Park

Upvotes: 3

Related Questions