User08
User08

Reputation: 33

Regular expression for html tags

Hey this is the exact piece of code, that I have being working on and I need to capture this content:

  1. On Air: Live at the BBC, Vol. 2
  2. The beatles
  3. 2013
  4. Pop/Rock

I have been trying to write a regular expression for this and I can't get it all correct. I think it's some problem with the div-tag and ahref-tag being not in a same line.May be, I am not sure. Please help...I need a regular expression for this. Thanks.

<div class="title">
            <a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{&quot;id&quot;:&quot;MW0002581064&quot;,&quot;thumbnail&quot;:true}">On Air: Live at the BBC, Vol. 2</a>            </div>

                <div class="artist">
                <a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a>            </div>

                <div class="year">
            2013            </div>

                <div class="genres">
            Pop/Rock            </div>

Upvotes: 1

Views: 153

Answers (1)

Jerry
Jerry

Reputation: 71538

You could perhaps use BeautifulSoup:

from bs4 import BeautifulSoup
html = '''
    <div class="title">
        <a href="http://www.allmusic.com/album/on-air-live-at-the-bbc-vol-2-mw0002581064" data-tooltip="{&quot;id&quot;:&quot;MW0002581064&quot;,&quot;thumbnail&quot;:true}">On Air: Live at the BBC, Vol. 2</a>
    </div>
    <div class="artist">
        <a href="http://www.allmusic.com/artist/the-beatles-mn0000754032">The Beatles</a>
    </div>
    <div class="year">
        2013
    </div>
    <div class="genres">
        Pop/Rock
    </div>
    '''

soup = BeautifulSoup(html)

for s in soup.find_all("div", ["title","artist","year","genres"]):
    print(s.text.strip())

Outputs:

On Air: Live at the BBC, Vol. 2
The beatles
2013
Pop/Rock

Upvotes: 2

Related Questions