theprowler
theprowler

Reputation: 3610

BeautifulSoup - get string between two tags

I would like to capture everything between two HTML tags using BeautifulSoup.

This is the snippet of HTML code that I am concerned with:

<br>NEFS VII &amp; VIII Manager<br>

So, even with my small understanding of HTML, I can see that I need to capture <br> tags and get the content between them. My question appears to be similar to this one (Python HTML Parsing Between two tags) where the solution is to use soup.find('br').next_sibling but trying that myself I run into the error:

AttributeError: 'ResultSet' object has no attribute 'next_sibling'.

Here is my relevant code:

with open(file_path) as in_f:
    msg = email.message_from_file(in_f) 

html_msg = msg.get_payload(1)   

body = html_msg.get_payload(decode=True)    

html = body.decode()   



br_tags = BeautifulSoup(html).find_all('br')
print("br_tags:", br_tags)
new_tags = BeautifulSoup(html).find_all('br').next_sibling
print("new_tags:", new_tags)
content = br_tags.string
print("content:", content)

The command print("br_tags:", br_tags) simply prints out 7 <br/>'s, all in a list. Trying the .next_sibling command as well as the .string command both result in the Attribute Error above.

I'm probably misunderstanding how BeautifulSoup is used because I'm a novice with it but I'd appreciate any help solving this, thanks.

EDIT:

Larger chunk of HTML:

$0.30</span><o:p></o:p></p></td><td style='padding:0in 0in 0in 0in;height:15.0pt'></td><td style='padding:0in 0in 0in 0in;height:15.0pt'><p class=MsoNormal align=right style='text-align:right'><span style='font-size:10.0pt'>$492.30</span><o:p></o:p></p></td></tr><tr style='height:15.0pt'><td style='padding:0in 0in 0in 0in;height:15.0pt'><p class=MsoNormal><span style='font-size:10.0pt'>GB WINTER FLOUNDER</span><o:p></o:p></p></td><td style='padding:0in 0in 0in 0in;height:15.0pt'></td><td style='padding:0in 0in 0in 0in;height:15.0pt'></td><td style='padding:0in 0in 0in 0in;height:15.0pt'><p class=MsoNormal align=right style='text-align:right'><span style='font-size:10.0pt'>95,659</span><o:p></o:p></p></td><td style='padding:0in 0in 0in 0in;height:15.0pt'></td><td style='padding:0in 0in 0in 0in;height:15.0pt'><p class=MsoNormal align=right style='text-align:right'><span style='font-size:10.0pt'>$0.25</span><o:p></o:p></p></td><td style='padding:0in 0in 0in 0in;height:15.0pt'></td><td style='padding:0in 0in 0in 0in;height:15.0pt'><p class=MsoNormal align=right style='text-align:right'><span style='font-size:10.0pt'>$23,914.75</span><o:p></o:p></p></td></tr></table><p style='margin-bottom:12.0pt'><span style='font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:black'>Linda McCann<br>NEFS VII &amp; VIII Manager<br>

Upvotes: 1

Views: 1315

Answers (1)

Dmitriy Fialkovskiy
Dmitriy Fialkovskiy

Reputation: 3235

The error itself tells you that 'ResultSet' object has no attribute 'next_sibling'. ResultSet is type that's got when one uses find_all().

And AttributeError appears because you also use in your script find_all() instead of find():

new_tags = BeautifulSoup(html).find_all('br').next_sibling # yours
new_tags = BeautifulSoup(html).find('br').next_sibling # correct

To get all br tags text use, for example, this:

br_list = []
for i in soup.find_all('br'):
    br_list.append(i.next_sibling)

Upvotes: 2

Related Questions