beautiful soup regex

Question

I just ran the following code in Python to take all of the certain emails out of an IMAP folder. The extraction part works fine and the BeautifulSoup part works okay, but the output has a lot of ' ' and ' ' within.

I tried to remove these with REGEX sub function but it's not working...not even giving an error message. Any idea what is wrong? I am attaching the code...please note (this is not complete code but everything above the code I'm posting works okay. It still prints the output, it's "prettified", but the and are still there. Have tried with find_all() but that doesn't work either.

mail.list()  # Lists all labels in GMail
mail.select('INBOX/Personal')  # Connected to inbox.

resp, items = mail.search(None, '(SEEN)')

items = items[0].split()  # getting the mails id        
for emailid in items:
    # getting the mail content
    resp, data = mail.fetch(emailid, '(UID BODY[TEXT])')
    text = str(data[0])  # [1] don't forget to add this back
    soup = bs(text, 'html.parser')
    soup = soup.prettify()
    soup = re.sub('\r\n', '', soup)

print(soup)

MasOOd.KamYab · Accepted Answer

You can use this for one line regex statement:

soup = re.sub('\r*n*', '', soup)

or you can use this:

soup = re.sub('\r', '', soup)
soup = re.sub('\n', '', soup)

https://regexr.com/3nnp1

beautiful soup regex

Answers (2)

Related Questions