pkj
pkj

Reputation: 781

beautifulsoup: Dropping text inside tags

I am trying to extract strings from a html file using beautifulsoup. A query replies with label tags inside them, how can I get rid of those tags.

from bs4 import BeautifulSoup
import requests

with open('/Desktop/filename.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

string = soup.find('div', class_="col-sm-8 col-xs-6")
print(string)

Output-

<div class="col-sm-8 col-xs-6">
    Sherlock Holmes <br>
    <label for="AgentAddress" style="display: none;">
        Detective's Address
    </label>
    221B Baker Street London <br>
    <label for="AgentCityStateZip" style="display: none;">
        City, State, Zip
    </label>
    London, United Kingdom            
</div>

print(string.text) outputs

    Sherlock Holmes
    Detective's Address
    221B Baker Street London
    City, State, Zip
    London, United Kingdom 

I am not interested in the text inside the <label></label> tags, how can I get rid of them so that the output is-

    Sherlock Holmes
    221B Baker Street London
    London, United Kingdom 

Upvotes: 1

Views: 25

Answers (1)

Cr4id3r
Cr4id3r

Reputation: 161

You can try with decompose, example, before the print use this:

for label_element in string.find_all("label"):
    label_element.decompose()

Upvotes: 1

Related Questions