Extracting HTML from outside the tag

Question

I´m trying to extract the HTML part that is located above and below a

tag, so for example from the example html below:

sample_html = """

<b>Main Title</b>
more
stuff
in here!

Windows
Type	Issue	Restart	Severity	Impact
some item	some website	Yes	Critical	stuff
some item	some website	Yes	Important	stuff


AGAIN
more
stuff
down here!

"""

I would like to obtain something like.

top_html = """

<b>Main Title</b>
more
stuff
in here!

"""

bottom_html = """

AGAIN
more
stuff
down here!

"""

Or already in text format, like:

top_html = 'Main Title more stuff down here!'

bottom_html = 'AGAIN more stuff down here!'

So I´ve been able to extract the

part of from the whole HTML and do my processing (I separate the rows and columns

so I can extract the values I need), with the following code:

soup = BeautifulSoup(input_html, "html.parser")
table = soup.find('table')

Adeyinka Badmus · Accepted Answer

This solution doesn't extensively use BeautifulSoup but works. Get index of opening and closing table tags, extract strings before and after.

soup = BeautifulSoup(sample_html, "html.parser")

def extract_top_and_bottom(soup):
    index_of_opening_tag = soup.index("")

    top_html = soup[:index_of_opening_tag]
    bottom_html = soup[index_of_closing_tag::].replace("", '')

    print(top_html)
    print(bottom_html)

extract_top_and_bottom(str(soup))

Extracting HTML from outside the <table> tag

Answers (2)

Related Questions

Extracting HTML from outside the &lt;table&gt; tag

Answers (2)

Related Questions

Extracting HTML from outside the <table> tag