Reputation: 33
Hey this is the exact piece of code, that I have being working on and I need to capture this content:
I have been trying to write a regular expression for this and I can't get it all correct. I think it's some problem with the div-tag and ahref-tag being not in a same line.May be, I am not sure. Please help...I need a regular expression for this. Thanks.
<div class="title">
<a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{"id":"MW0002581064","thumbnail":true}">On Air: Live at the BBC, Vol. 2</a> </div>
<div class="artist">
<a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a> </div>
<div class="year">
2013 </div>
<div class="genres">
Pop/Rock </div>
Upvotes: 1
Views: 153
Reputation: 71538
You could perhaps use BeautifulSoup:
from bs4 import BeautifulSoup
html = '''
<div class="title">
<a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{"id":"MW0002581064","thumbnail":true}">On Air: Live at the BBC, Vol. 2</a>
</div>
<div class="artist">
<a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a>
</div>
<div class="year">
2013
</div>
<div class="genres">
Pop/Rock
</div>
'''
soup = BeautifulSoup(html)
for s in soup.find_all("div", ["title","artist","year","genres"]):
print(s.text.strip())
Outputs:
On Air: Live at the BBC, Vol. 2
The beatles
2013
Pop/Rock
Upvotes: 2