Reputation: 89
I am using beautifulsoup of python
<div class="test1">
<a href="www.google.com" blur blur~> text </a>
</div>
<div class="test2">
<a href="www.stackoverflow.com" blur blur~> text </a>
</div>
<div class="test3">
<a href="www.msn.com" blur blur~> text </a>
</div>
<div class="test4">
<a href="www.naver.com" blur blur~> text </a>
</div>
<div class="test5">
<a href="www.ios.com" blur blur~> text </a>
</div>
like this situation, i wanna get a specific href info. For example how can i use the class name, when i need a href='www.ios.com'.
HTML file has more than 1000 'a' selector and included url address is dynamic.
how can i get this? please answer me T.T
Upvotes: 1
Views: 15673
Reputation: 21
for item in results a = item.find("a") item_href = a['href'] print(item_href)
Upvotes: 0
Reputation: 142641
Full working example.
For example you can use select
and CSS selectors like .class
, #id
and tag
.
from bs4 import BeautifulSoup
content='''<div class="test1">
<a href="www.google.com" blur blur~> text </a>
</div>
<div class="test2">
<a href="www.stackoverflow.com" blur blur~> text </a>
</div>
<div class="test3">
<a href="www.msn.com" blur blur~> text </a>
</div>
<div class="test4">
<a href="www.naver.com" blur blur~> text </a>
</div>
<div class="test5">
<a href="www.ios.com" blur blur~> text </a>
</div>'''
soup = BeautifulSoup(content, 'html.parser')
all_a = soup.select('.test5 a')
for a in all_a:
print(a['href'])
# www.ios.com
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
Upvotes: 7