Damien
Damien

Reputation: 611

Beautiful Soup - taking classes out of an HTML file

I have an HTML file and I want to take grab the text from this block, shown here:

 <strong class="fullname js-action-profile-name">User Name</strong>
    <span>&rlm;</span>
    <span class="username js-action-profile-name"><s>@</s><b>UserName</b></span>

I want it to display as:

User Name
@UserName

How would I do this using Beautiful Soup?

Upvotes: 3

Views: 239

Answers (3)

joe
joe

Reputation: 817

from bs4 import BeautifulSoup

html = '''<strong class="fullname js-action-profile-name">User Name</strong>
    <span>&rlm;</span>
    <span class="username js-action-profile-name"><s>@</s><b>UserName</b></span>'''

soup = BeautifulSoup(html)

username = soup.find(attrs={'class':'username js-action-profile-name'}).text
fullname = soup.find(attrs={'class':'fullname js-action-profile-name'}).text

print fullname
print username

Outputs:

User Name
@UserName

Two notes:

  1. Use bs4 if you're starting something new / just learning BS.

  2. You will probably be loading your HTML from an external file, so replace html with a file object.

Upvotes: 1

dank.game
dank.game

Reputation: 4709

This assumes index.html contains the markup from the question:

import BeautifulSoup

def displayUserInfo():

    soup = BeautifulSoup.BeautifulSoup(open("index.html"))
    fullname_ele = soup.find(attrs={"class": "fullname js-action-profile-name"})
    fullname = fullname_ele.contents[0]
    print fullname

    username_ele = soup.find(attrs={"class": "username js-action-profile-name"})
    username = ""
    for child in username_ele.findChildren():
        username += child.contents[0]
    print username

if __name__ == '__main__':
    displayUserInfo()

# prints:
# User Name
# @UserName

Upvotes: 0

Mike Axiak
Mike Axiak

Reputation: 11996

Use the "text" attribute. Example:

>>> b = BeautifulSoup.BeautifulStoneSoup(open('/tmp/x.html'), convertEntities=BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES)

>>> print b.find(attrs={"id": "container"}).text
User Name‏@UserName

In x.html I have a div containing the html you provided, with an id of "container". Note that I convert the ‏ to \u200f with BeautifulStoneSoup. To insert a newline (that wouldn't be introduced by a browser) just replace u'\u200f' with '\n'.

Upvotes: 1

Related Questions