mrooney
mrooney

Reputation: 2022

How do I include chats in a Gmail IMAP search from Python's imaplib?

I'm happily using imaplib to get the message IDs in a specific label:

connection.select("MyLabel")
connection.uid('SEARCH', None, 'ALL'))

but if I've got some chats in that label, they aren't returned, so they are invisible to IMAP. I've read Accessing Chat Folder in Python Using Imaplib, though this is for searching in the Chats label, not finding chats in another label, and it doesn't appear to make this case work.

I could perhaps perform a second search in "Chats" for messages labelled "MyLabel", but this is an extra query and is asking for quite a bit of set up from users of my application.

Upvotes: 3

Views: 1843

Answers (1)

dnozay
dnozay

Reputation: 24304

Gmail labels are exposed as toplevel mailboxes, not the other way around. To search multiple mailboxes, you need to do multiple queries, thus performing select() on the appropriate mailbox then doing the search command (or uid in your case).

Configuring your gmail account for access to Chats over IMAP:

The link you gave: Accessing Chat Folder in Python Using Imaplib is still very relevant as users will need to allow IMAP access to their chat logs. You can also check the imap extensions used by Gmail, with description of X-GM-RAW and X-GM-LABELS.

If you are using Gmail for business, I am not sure if it works (I don't have an account to verify), but this link: https://developers.google.com/gmail/imap_extensions#checking_for_the_presence_of_extensions may help you see if the extensions are present.

Modified utf-7 encoding:

Most imap servers store mailbox names and labels in a modified version of utf-7. You can't use straight labels like that for gmail unless you are using plain us-ascii. IMAPClient knows how to encode/decode using the modified utf7 encoding used by most IMAP servers. There is a bug open against imaplib so you may want to use imapclient.imap_utf7 module to encode mailbox names and/or labels until imaplib starts supporting the modified utf-7 encoding on its own. Other thing I found online: while you may be able to STORE labels successfully with a particular encoding, you fail miserably to SEARCH for them (also when xoauth is involved) unless you are using that modified utf-7 encoding or indicating the charset. Other projects already do most of the work for gmail, e.g. BaGoMa (backup google mail) which ships with imap-utf7 support. So far, I've been able to create a label through the UI with latin-1 character and SEARCH for it using the utf-8 charset.

Here is how to encode your label:

from imapclient import imap_utf7
label = imap_utf7.encode(u'yourlabel')

see also this question: IMAP folder path encoding (IMAP UTF-7) for Python

You can inspect your labels with:

>>>> sock.select("[Gmail]/Chats", True)
>>>> sock.uid('FETCH', '1:*', 'X-GM-LABELS')

This is useful to check what labels you have and for debugging encoding problems.

Example:

import imaplib
import getpass
import atexit
from imapclient import imap_utf7

def find_messages(sock, label):
    mailbox = imap_utf7.encode(label)
    label = imap_utf7.encode(label.encode('utf-8'))
    try:
        # process regular mailbox
        sock.select(mailbox)
    except sock.error:
        pass
    else:
        resp, data = sock.uid('SEARCH', None, '(ALL)')
        assert resp == 'OK'
        for uid in data[0].split():
            # because we do select, this uid will be valid.
            yield uid   
    try:
        # now process chats with that label
        sock.select("[Gmail]/Chats", True)
    except sock.error:
        # access to chats via IMAP is disabled most likely
        pass
    else:
        # resp, data = sock.uid('SEARCH', 'X-GM-RAW', 'label:%s' % label)
        sock.literal = label
        resp, data = sock.uid('SEARCH', 'CHARSET', 'UTF-8', 'X-GM-LABELS')
        assert resp == 'OK'
        for uid in data[0].split():
            # because we do select, this uid will be valid.
            yield uid

def test():
    email = "[email protected]"
    label = u"français" # oui oui merci beaucoup.
    sock = imaplib.IMAP4_SSL("imap.gmail.com", 993)
    sock.login(email, getpass.getpass())
    for uid in find_messages(sock, label):
        # e.g.
        print sock.uid('FETCH', uid, '(BODY[HEADER])')
    sock.close()
    sock.logout()

tested on my machine!

>>> test()
Password: 
('OK', [('1 (UID 14 BODY[HEADER] {398}', 'MIME-Version: 1.0\r\nReceived: by 10.XXX.XXX.XXX with HTTP; Thu, 11 Jul 2013 09:54:32 -0700 (PDT)\r\nDate: Thu, 11 Jul 2013 09:54:32 -0700\r\nDelivered-To: [email protected]\r\nMessage-ID: <[email protected]>\r\nSubject: test email\r\nFrom: Damien <[email protected]>\r\nTo: Damien <[email protected]>\r\nContent-Type: text/plain; charset=ISO-8859-1\r\n\r\n'), ')'])
('OK', [('1 (UID 1 BODY[HEADER] {47}', 'From: Damien XXXXXXXX <[email protected]>\r\n\r\n'), ')'])
('OK', [('2 (UID 2 BODY[HEADER] {46}', 'From: Vincent XXXXXXXX <[email protected]>\r\n\r\n'), ')'])

Undocumented interface:

imaplib is able to use literals, this is useful in particular when using a different encoding. This works by setting the IMAP4.literal attribute before running the command.

sock.literal = label
resp, data = sock.uid('SEARCH', 'CHARSET', 'UTF-8', 'X-GM-LABELS')

Upvotes: 2

Related Questions