Reputation: 781
I am trying to extract strings from a html file using beautifulsoup. A query replies with label tags inside them, how can I get rid of those tags.
from bs4 import BeautifulSoup
import requests
with open('/Desktop/filename.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
string = soup.find('div', class_="col-sm-8 col-xs-6")
print(string)
Output-
<div class="col-sm-8 col-xs-6">
Sherlock Holmes <br>
<label for="AgentAddress" style="display: none;">
Detective's Address
</label>
221B Baker Street London <br>
<label for="AgentCityStateZip" style="display: none;">
City, State, Zip
</label>
London, United Kingdom
</div>
print(string.text)
outputs
Sherlock Holmes
Detective's Address
221B Baker Street London
City, State, Zip
London, United Kingdom
I am not interested in the text inside the <label></label>
tags, how can I get rid of them so that the output is-
Sherlock Holmes
221B Baker Street London
London, United Kingdom
Upvotes: 1
Views: 25
Reputation: 161
You can try with decompose, example, before the print use this:
for label_element in string.find_all("label"):
label_element.decompose()
Upvotes: 1