Reputation: 39
How would I extract the Agency Fee, Bedrooms, And Bathroom's info using beautiful soup in python. [Here][1] is the webpage I am scraping.
<ul class="important-fields">
<li class="">
<span> Agency Fees: </span>
<strong> AED 5000 </strong>
</li>
<li class="">
<span> Bedrooms: </span>
<strong> Studio </strong>
</li>
<li class="">
<span> Bathrooms: </span>
<strong> 1 </strong>
</li>
<li>
</ul>
Upvotes: 0
Views: 781
Reputation: 369134
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''
... <ul class="important-fields">
... <li class="">
... <span> Agency Fees: </span>
... <strong> AED 5000 </strong>
... </li>
... <li class="">
... <span> Bedrooms: </span>
... <strong> Studio </strong>
... </li>
... <li class="">
... <span> Bathrooms: </span>
... <strong> 1 </strong>
... </li>
... </ul>
... '''
>>>
>>> soup = BeautifulSoup(html)
>>> spans = [x.text.strip() for x in soup.select('ul.important-fields li span')]
>>> strongs = [x.text.strip() for x in soup.select('ul.important-fields li strong')]
>>> spans
[u'Agency Fees:', u'Bedrooms:', u'Bathrooms:']
>>> strongs
[u'AED 5000', u'Studio', u'1']
>>> for name, value in zip(spans, strongs):
... print('{} {}'.format(name, value))
...
Agency Fees: AED 5000
Bedrooms: Studio
Bathrooms: 1
Upvotes: 2
Reputation: 928
You can use Xpath (http://www.w3schools.com/xpath/) to get the data from the HTML using lxml library in python and you can find examples in lxml tutorials (http://lxml.de/tutorial.html).
Upvotes: 0