Zeynel
Zeynel

Reputation: 13515

How to eliminate email formatting in received email?

I am practicing sending emails with Google App Engine with Python. This code checks to see if message.sender is in the database:

class ReceiveEmail(InboundMailHandler):
    def receive(self, message):
        querySender = User.all()
        querySender.filter("userEmail =", message.sender)
        senderInDatabase = None
        for match in querySender:
            senderInDatabase = match.userEmail

This works in the development server because I send the email as "[email protected]" and message.sender="[email protected]"

But I realized that in the production server emails come formatted as "az <[email protected]> and my code fails because now message.sender="az <[email protected]>" but the email in the database is simple "[email protected]".

I searched for how to do this with regex and it is possible but I was wondering if I can do this with Python lists? Or, what do you think is the best way to achieve this result? I need to take just the email address from the message.sender.

App Engine documentation acknowledges the formatting but I could not find a specific way to select the email address only.

Thanks!

EDIT2 (re: Forest answer)

@Forest: parseaddr() appears to be simple enough:

>>> e = "az <[email protected]>"
>>> parsed = parseaddr(e)
>>> parsed
('az', '[email protected]')
>>> parsed[1]
'[email protected]'
>>>

But this still does not cover the other type of formatting that you mention: [email protected] (Full Name)

>>> e2 = "<[email protected]> az"
>>> parsed2 = parseaddr(e2)
>>> parsed2
('', '[email protected]')
>>>

Is there really a formatting where full name comes after the email?

EDIT (re: Adam Bernier answer)

My try about how the regex works (probably not correct):

r    # raw string
<     # first limit character
(     # what is inside () is matched     
[       # indicates a set of characters
^         # start of string
>         # start with this and go backward?
]       # end set of characters
+       # repeat the match
)     # end group
>    # end limit character

Upvotes: 0

Views: 285

Answers (2)

ʇsәɹoɈ
ʇsәɹoɈ

Reputation: 23479

Rather than storing the entire contents of a To: or From: header field as an opaque string, why don't you parse incoming email and store email address separately from full name? See email.utils.parseaddr(). This way you don't have to use complicated, slow pattern matching when you want to look up an address. You can always reassemble the fields using formataddr().

Upvotes: 5

mechanical_meat
mechanical_meat

Reputation: 169334

If you want to use regex try something like this:

>>> import re
>>> email_string = "az <[email protected]>"
>>> re.findall(r'<([^>]+)>', email_string)
['[email protected]']

Note that the above regex handles multiple addresses...

>>> email_string2 = "az <[email protected]>, bz <[email protected]>"
>>> re.findall(r'<([^>]+)>', email_string2)
['[email protected]', '[email protected]']

but this simpler regex doesn't:

>>> re.findall(r'<(.*)>', email_string2)
['[email protected]>, bz <[email protected]'] # matches too much

Using slices—which I think you meant to say instead of "lists"—seems more convoluted, e.g.:

>>> email_string[email_string.find('<')+1:-1]
'[email protected]'

and if multiple:

>>> email_strings = email_string2.split(',')
>>> for s in email_strings:
...   s[s.find('<')+1:-1]
...
'[email protected]'
'[email protected]'

Upvotes: 0

Related Questions