tarek hassan
tarek hassan

Reputation: 802

How to skip <span> with beautiful soup

Here is output of my code

<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>

I want to get item name only, without "details about" part.

My Python code the selects the certain div id is

for content in soup.select('#itemTitle'):
    print(content.text)

Upvotes: 3

Views: 4491

Answers (4)

juanmhidalgo
juanmhidalgo

Reputation: 1546

You can use decompose() clear() or extract(). According to the documentation:

Tag.decompose() removes a tag from the tree, then completely destroys it and its contents

Tag.clear() removes the contents of a tag

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted

from bs4 import BeautifulSoup
html = '''<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>'''

soup = BeautifulSoup(html, 'lxml')
for content in soup.select('#itemTitle'):
    content.span.decompose()
    print(content.text)

Output:

  item name goes here  

Upvotes: 9

Austin
Austin

Reputation: 26039

Try if this works

from bs4 import BeautifulSoup 
soup = BeautifulSoup("""<h1 class="it-ttl" id="itemTitle" itemprop="name">
<span class="g-hdn">Details about  </span>
item name goes here</h1>""")  
print(soup.find('h1', {'class': 'it-ttl'}).contents[-1].strip())

Upvotes: 0

briancaffey
briancaffey

Reputation: 2559

How about this:

from bs4 import BeautifulSoup
html= """<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>"""

soup = BeautifulSoup(html, "lxml")

text = soup.find('h1', attrs={"id":"itemTitle"}).text
span = soup.find('span', attrs={"class":"g-hdn"}).text

final_text = text[len(span):]

print(final_text)

This results in:

item name goes here

Upvotes: 2

Ali
Ali

Reputation: 1357

My answer is inspired by this accepted answer.

Code:

from bs4 import BeautifulSoup, NavigableString

data = '''
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about   </span>item name goes here</h1>
'''

soup = BeautifulSoup(data, 'html.parser')
inner_text = [element for element in soup.h1 if isinstance(element, NavigableString)]
print(inner_text)

Output:

['item name goes here']

Upvotes: 2

Related Questions