Returning body text using BeautifulSoup

Question

I'm trying to use BeautifulSoup to scrape HTML tags off of something that was returned using ExchangeLib. What I have so far is this:

from exchangelib import Credentials, Account
import urllib3
from bs4 import BeautifulSoup

credentials = Credentials('myemail@notreal.com', 'topSecret')
account = Account('myemail@notreal.com', credentials=credentials, autodiscover=True)

for item in account.inbox.all().order_by('-datetime_received')[:1]:
    soup = BeautifulSoup(item.unique_body, 'html.parser')
    print(soup)

As is, this will use exchangeLib to grab the first email from my inbox via Exchange, and print specifically the unique_body which contains the body text of the email. Here is a sample of the output from print(soup):



Hey John,
 
Here is a test email

My end goal is to have it print:

Hey John,
Here is a test email

From what I'm reading on BeautifulSoup documentation, the process of scraping falls between my "Soup =" line and the final print line.

My issue is that in order to run the scraping portion of BeautifulSoup, it requires a class and h1 tags such as: name_box = soup.find(‘h1’, attrs={‘class’: ‘name’}), however from what I currently have, I have none of this.

As someone who is new to Python, how should I go about doing this?

KunduK · Accepted Answer

You can try Find_all to get all the font tag value and then iterate.

from bs4 import BeautifulSoup
html="""

Hey John,
 
Here is a test email


"""

soup = BeautifulSoup(html, "html.parser")
for span in soup.find_all('font'):
      print(span.text)

Output:

Hey John,

Here is a test email

Returning body text using BeautifulSoup

Answers (2)

Related Questions