Reputation: 1585
I've browsed the previous questions for an hour and tried various solutions but I can't get this to work. I've extracted the results I want from a website, now I just have to mine these divs for the specific information I want.
The results are isolated like so:
items=soup.findAll(id=re.compile("itembase"))
For each item, I want to extract for example the lat and long from this piece of html:
<div id="itembase29" class="result-item -result unselected clearfix even" data-
part="fl_base" data-lat="51.9006" data-lon="-8.51008" data-number="29"
is-local="true" data-customer="32060963" data-addrid="1"
data-id="4b00fae498e3cc370133e8a14fd75160">
<div class="arrow">
</div>
How do I do that? Thanks.
Upvotes: 1
Views: 8660
Reputation: 1791
Pass your html object into beautiful soup.
soup = BeautifulSoup(html)
Find the div.
div = soup.div
Get the attributes you're looking for from the div.
lat, lon = div.attrs['data-lat'], div.attrs['data-lon']
Print.
>>> print lat, lon
51.9006 -8.51008
I left the .attrs
method in there for clarity, but in more general terms, you can access the attributes of any element like a dictionary, you don't even really need the .attrs
method, like so: div['data-lon']
. This obviously doesnt work over a list of div
s, you need to iterate over the list.
for div in divs:
print div['data-lon'], div['data-lat']
Or list comprehension.
[(div['data-lon'], div['data-lat']) for div in divs]
Upvotes: 2