Reputation: 111
I just ran the following code in Python to take all of the certain emails out of an IMAP folder. The extraction part works fine and the BeautifulSoup part works okay, but the output has a lot of '\r' and '\n' within.
I tried to remove these with REGEX sub function but it's not working...not even giving an error message. Any idea what is wrong? I am attaching the code...please note (this is not complete code but everything above the code I'm posting works okay. It still prints the output, it's "prettified", but the \r and \n are still there. Have tried with find_all() but that doesn't work either.
mail.list() # Lists all labels in GMail
mail.select('INBOX/Personal') # Connected to inbox.
resp, items = mail.search(None, '(SEEN)')
items = items[0].split() # getting the mails id
for emailid in items:
# getting the mail content
resp, data = mail.fetch(emailid, '(UID BODY[TEXT])')
text = str(data[0]) # [1] don't forget to add this back
soup = bs(text, 'html.parser')
soup = soup.prettify()
soup = re.sub('\\r\\n', '', soup)
print(soup)
Upvotes: 7
Views: 447
Reputation: 7221
What about replace
command directly? Since it is not regex, it should be faster.
soup.replace("\n","").replace("\r","")
Upvotes: 2
Reputation: 984
You can use this for one line regex statement:
soup = re.sub('\\r*n*', '', soup)
or you can use this:
soup = re.sub('\\r', '', soup)
soup = re.sub('\\n', '', soup)
Upvotes: 4