Drop certain rows from an html table using beautifulsoup

Question

I have a very basic question and couldn't find an answer to it on SO. Suppose I have an HTML table as follows:

html1 = """


Id
Month

1 January
2 February
3
4
5 October
6 December
7
Correct

"""

I want to drop the tr tags whose first td tag is not a digit and keep the rest of the table intact. I'm not sure if it makes sense but below is the desired output:



Id
Month

1 January
2 February
3
4
5 October
6 December
7

MendelG · Accepted Answer

To remove all whose first is not a digit, make sure that the is not .isdigit() and then .extract() it:

from bs4 import BeautifulSoup


html1 = """

   
      
         Id
         Month
      
      
         1
         January
      
      
         2
         February
      
      
         3
      
      
         4
      
      
         5
         October
      
      
         6
         December
      
      
         7
      
      
         Correct
      
   

"""

soup = BeautifulSoup(html1, "html.parser")

[tag.extract() for tag in soup.find_all("tr") if not tag.find_next("td").text.isdigit()]
print(soup.prettify())

Drop certain rows from an html table using beautifulsoup

Answers (1)

Related Questions