user16091296
user16091296

Reputation: 21

Extract information with the same tag

For the Zillow data below, number of beds (bds), number of bath (ba) and square foot (sqfr) have the same tag <li class="">. How can I get information for these 3 elements. My code below is clearly not working. The result should be: 3 , 2, 1813

Can you please help? Thanks Hong

<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0">
# <address class="list-card-addr">12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a>
# <div class="list-card-footer"><p class="list-card-extra-info">LONG &amp; FOSTER REAL ESTATE, INC.</p></div><div class="list-card-heading">
# <div class="list-card-price">$411,000</div><ul class="list-card-details">
# <li class="">3<abbr class="list-card-label"> <!-- -->bds</abbr></li>
# <li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li>
# <li class="">1,813<abbr class="list-card-label"> <!-- -->sqft</abbr>
# </li><li class="list-card-statusText">- Apartment for sale</li></ul></div></div>

tag='<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address class="list-card-addr">12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div class="list-card-footer"><p class="list-card-extra-info">LONG &amp; FOSTER REAL ESTATE, INC.</p></div><div class="list-card-heading"><div class="list-card-price">$411,000</div><ul class="list-card-details"><li class="">3<abbr class="list-card-label"> <!-- -->bds</abbr></li><li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li><li class="">1,813<abbr class="list-card-label"> <!-- -->sqft</abbr></li><li class="list-card-statusText">- Apartment for sale</li></ul></div></div>'
tag = BeautifulSoup(tag, 'html.parser')

address = tag.findAll('address', {'class': 'list-card-addr'})
price   = tag.findAll('div', {'class': 'list-card-price'})
beds    = tag.findAll('li', {'class': ""}) 

# keep text only, remove tag
address=address[0].text; 
price=price[0].text ;
beds=beds[0].text; print(beds)
print(address, '---',price, '---',beds)

Upvotes: 0

Views: 60

Answers (2)

0xd34dc0de
0xd34dc0de

Reputation: 523

That should do it:

#<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0">
# <address class="list-card-addr">12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a>
# <div class="list-card-footer"><p class="list-card-extra-info">LONG &amp; FOSTER REAL ESTATE, INC.</p></div><div class="list-card-heading">
# <div class="list-card-price">$411,000</div><ul class="list-card-details">
# <li class="">3<abbr class="list-card-label"> <!-- -->bds</abbr></li>
# <li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li>
# <li class="">1,813<abbr class="list-card-label"> <!-- -->sqft</abbr>
# </li><li class="list-card-statusText">- Apartment for sale</li></ul></div></div>

tag='<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address class="list-card-addr">12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div class="list-card-footer"><p class="list-card-extra-info">LONG &amp; FOSTER REAL ESTATE, INC.</p></div><div class="list-card-heading"><div class="list-card-price">$411,000</div><ul class="list-card-details"><li class="">3<abbr class="list-card-label"> <!-- -->bds</abbr></li><li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li><li class="">1,813<abbr class="list-card-label"> <!-- -->sqft</abbr></li><li class="list-card-statusText">- Apartment for sale</li></ul></div></div>'
tag = BeautifulSoup(tag, 'html.parser')


list_items = tag.findAll('li', {'class': ""})

# keep text only, remove tag
regex = re.compile('([\\d,]*)')
address = regex.findall(list_items[0].text)[0]
price = regex.findall(list_items[1].text)[0]
beds = regex.findall(list_items[2].text)[0]

print(address, '---',price, '---',beds)

Upvotes: 0

Kirsten_J
Kirsten_J

Reputation: 96

When you call tag.findAll it creates a ResultSet with all three values saved. You can then access each one using the index number, as shown below.

from bs4 import BeautifulSoup

tag= '<div class="list-card-info"><a class="list-card-link list-card-link-top-margin" href="https://www.zillow.com/homedetails/12021-Tralee-Rd-UNIT-102-Lutherville-MD-21093/60873148_zpid/" tabindex="0"><address class="list-card-addr">12021 Tralee Rd UNIT 102, Lutherville, MD 21093</address></a><div class="list-card-footer"><p class="list-card-extra-info">LONG &amp; FOSTER REAL ESTATE, INC.</p></div><div class="list-card-heading"><div class="list-card-price">$411,000</div><ul class="list-card-details"><li class="">3<abbr class="list-card-label"> <!-- -->bds</abbr></li><li class="">2<abbr class="list-card-label"> <!-- -->ba</abbr></li><li class="">1,813<abbr class="list-card-label"> <!-- -->sqft</abbr></li><li class="list-card-statusText">- Apartment for sale</li></ul></div></div>'

tag = BeautifulSoup(tag, 'html.parser')

tags = tag.findAll('li', {'class': ""})

# keep text only, remove tag
address=tags[0].text;
price=tags[1].text ;
beds=tags[2].text;
print(address, '---',price, '---',beds)

Upvotes: 1

Related Questions