Reputation: 802
Here is output of my code
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about </span>item name goes here</h1>
I want to get item name only, without "details about" part.
My Python code the selects the certain div id is
for content in soup.select('#itemTitle'):
print(content.text)
Upvotes: 3
Views: 4491
Reputation: 1546
You can use decompose() clear() or extract(). According to the documentation:
Tag.decompose() removes a tag from the tree, then completely destroys it and its contents
Tag.clear() removes the contents of a tag
PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted
from bs4 import BeautifulSoup
html = '''<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about </span>item name goes here</h1>'''
soup = BeautifulSoup(html, 'lxml')
for content in soup.select('#itemTitle'):
content.span.decompose()
print(content.text)
Output:
item name goes here
Upvotes: 9
Reputation: 26039
Try if this works
from bs4 import BeautifulSoup
soup = BeautifulSoup("""<h1 class="it-ttl" id="itemTitle" itemprop="name">
<span class="g-hdn">Details about </span>
item name goes here</h1>""")
print(soup.find('h1', {'class': 'it-ttl'}).contents[-1].strip())
Upvotes: 0
Reputation: 2559
How about this:
from bs4 import BeautifulSoup
html= """<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about </span>item name goes here</h1>"""
soup = BeautifulSoup(html, "lxml")
text = soup.find('h1', attrs={"id":"itemTitle"}).text
span = soup.find('span', attrs={"class":"g-hdn"}).text
final_text = text[len(span):]
print(final_text)
This results in:
item name goes here
Upvotes: 2
Reputation: 1357
My answer is inspired by this accepted answer.
Code:
from bs4 import BeautifulSoup, NavigableString
data = '''
<h1 class="it-ttl" id="itemTitle" itemprop="name"><span class="g-hdn">Details about </span>item name goes here</h1>
'''
soup = BeautifulSoup(data, 'html.parser')
inner_text = [element for element in soup.h1 if isinstance(element, NavigableString)]
print(inner_text)
Output:
['item name goes here']
Upvotes: 2