Biswankar Das
Biswankar Das

Reputation: 305

Download google drive attachments of an email using Gmail API in python

I currently use this solution to download attachments from Gmail using Gmail API via python. However, every time an attachment exceeds 25MB, the attachments automatically get uploaded to Google Drive and the files are linked in the mail. In such cases, there is no attachmentId in the message. I can only see the file names in 'snippet' section of the message file.

Is there any way I can download the Google dive attachments from mail?

There is a similar question posted here, but there's no solution provided to it yet

Upvotes: 1

Views: 1487

Answers (2)

Harsh Chaudhary
Harsh Chaudhary

Reputation: 1

Drive API has also limtitation of downloading 10MBs only

Upvotes: 0

iansedano
iansedano

Reputation: 6481

How to download a Drive "attachment"

The "attachment" referred to is actually just a link to a Drive file, so confusingly it is not an attachment at all, but just text or HTML.

The issue here is that since it's not an attachment as such, you won't be able to fetch this with the Gmail API by itself. You'll need to use the Drive API.

To use the Drive API you'll need to get the file ID. Which will be within the HTML content part among others.

You can use the re module to perform a findall on the HTML content, I used the following regex pattern to recognize drive links:

(?<=https:\/\/drive\.google\.com\/file\/d\/).+(?=\/view\?usp=drive_web)

Here is a sample python function to get the file IDs. It will return a list.

def get_file_ids(service, user_id, msg_id):
    message = service.users().messages().get(userId=user_id, id=msg_id).execute()
    for part in message['payload']['parts']:
        if part["mimeType"] == "text/html":
            b64 = part["body"]["data"].encode('UTF-8')
            unencoded_data = str(base64.urlsafe_b64decode(b64))
            results = re.findall(
                '(?<=https:\/\/drive\.google\.com\/file\/d\/).+(?=\/view\?usp=drive_web)',
                unencoded_data
            )
            return results

Once you have the IDs then you will need to make a call to the Drive API.

You could follow the example in the docs:

file_ids = get_file_ids(service, "me", "[YOUR_MSG_ID]"

for id in file_ids:
    request = service.files().get_media(fileId=id)
    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print "Download %d%%." % int(status.progress() * 100)

Remember, seeing as you will now be using the Drive API as well as the Gmail API, you'll need to change the scopes in your project. Also remember to activate the Drive API in the developers console, update your OAuth consent screen, credentials and delete the local token.pickle file.

References

Upvotes: 2

Related Questions