How can I remove all different script tags in BeautifulSoup?

Question

I crawl a table from a web link and would like to rebuild a table by removing all script tags. Here are the source codes.

response = requests.get(url)
soup = BeautifulSoup(response.text)
table = soup.find('table')

for row in table.find_all('tr') :                                                                                                                                                                                                                                                                                                                                                                                                     
    for col in row.find_all('td'):
        #remove all different script tags
        #col.replace_with('') 
        #col.decompose()  
        #col.extract()
        col = col.contents

How can I remove all different script tags? Take the follow cell as an exampple, which includes the tag a, br and td.

Signal et Communication

Ingénierie Réseaux et Télécommunications

My expected result is:

Signal et Communication
Ingénierie Réseaux et Télécommunications

alecxe · Accepted Answer

You are asking about get_text():

If you only want the text part of a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string

td = soup.find("td")
td.get_text()

Note that .string would return you None in this case since td has multiple children:

If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None

Demo:

>>> from bs4 import BeautifulSoup
>>> 
>>> soup = BeautifulSoup(u"""
... Signal et Communication
... 
Ingénierie Réseaux et Télécommunications
... 
... """)
>>> 
>>> td = soup.td
>>> print td.string
None
>>> print td.get_text()
Signal et Communication
Ingénierie Réseaux et Télécommunications

How can I remove all different script tags in BeautifulSoup?

Answers (2)

Related Questions