Jim isaac
Jim isaac

Reputation: 860

Upload large File from client to GDrive using google-api-client using resumable upload in flask/python storing in memory

I am trying to upload a large file to gdrive using the google-api-client. I am using the resumable upload. The problem here is I don't want the file object to be saved/write in my file system. I want it to read it in chunks and therefore upload the same chunks to Gdrive using resumable upload. Is there anyway where this can be achieved where i can send chunks via the google api python client.

Here is my sample Code which works but it gets the entire file object from client.

@app.route('/upload', methods = ["GET", "POST"])
def upload_buffer():
    drive = cred()
    if request.method == "POST":
        mime_type = request.headers['Content-Type']
         body = {
        'name': "op.pdf",
        'mimeType': mime_type,
         }

         chunk = BytesIO(request.stream.read()) # as you can see here the entire file stream is obtained 

         #I want to read in chunks and simultaneously send that chunk to GDrive

         #chunk = BytesIO(request.stream.read(1024))     

         #if I send like the above only some part of the file is uploaded in Gdrive

         media_body = MediaIoBaseUpload(chunk, chunksize = 1024, mimetype=mime_type,
                                   resumable=True)

         return drive.files().create(body=body,
                     media_body=media_body,
                     fields='id,name,mimeType,createdTime,modifiedTime').execute()

    return render_template("upload_image.html")

<-------------------------------------------------------------------> This is how I approached using the Google Rest APIs

@app.route('/upload3', methods=["GET", "POST"])
def upload_buff():

if request.method == "POST":
    Content_Length = request.headers['Content-Length']

    access_token = '####'

    headers = {"Authorization": "Bearer " + access_token, "Content-Type": "application/json", "Content-Length": Content_Length}
    params = {
        "name": "file_name.pdf",
        "mimeType": "application/pdf"
    }
    r = requests.post("https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable", headers=headers, data=json.dumps(params))

    location = r.headers['Location']
    print("----------------------------------------------------")
    print("GDrive Upload url : ", location)
    print("----------------------------------------------------")

    start = str(0)
    while True:
        chunk = request.stream.read(1024 * 1024)
        chunk_size = len(chunk)
        print("----------------------------------------------------")
        print("Size of Received Chunk From Client:  ", chunk_size)
        print("----------------------------------------------------")
        if chunk_size == 0:
            break
        end = str(int(start)+chunk_size-1)
        headers = {'Content-Range': 'bytes '+start+'-' + end + '/' +str(Content_Length), 'Content-Length': str(chunk_size)}
        start = str(int(end)+1)
        print("The headers set for the chunk upload : ", headers)
        r = requests.put(location, headers=headers, data=chunk)
        print("----------------------------------------------------")
        print("Response content : ", r.content)
        print("Response headers : ", r.headers)
        print("Response status : ", r.status_code)
        print("----------------------------------------------------")
    return r.content

return render_template("upload_image.html")

Upvotes: 3

Views: 1196

Answers (1)

Jacques-Guzel Heron
Jacques-Guzel Heron

Reputation: 2598

Reading your question and code I assume that you saved the stream in a variable called chunk and want to divide it into 1024 bytes blocks to use a resumable upload. If my comprehension of the question is correct, you can use slicing in the bytes object chunk in a way similar to this:

chunk = b"\x04\x09\x09\x01\x01\x01\x00\x03" # Example values
chunk[:3] # Equals to b"\x04\x09\x09"
chunk[-3:] # Equals to b"\x01\x00\x03"
chunk[4:2] # Equals to b"\x01\x01"

You can use this approach to slice the chunk into 1024 bytes pieces. Please, ask me any question if you need more help.


I apologize for my faulty comprehension of your question. I now understand that you have a bytes object divided into chunks and want to upload it to Drive using a resumable upload. If my actual assumption is correct, you can use the code that I wrote for that scenario. With this code there is no need to write anything on the hard drive.

#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import locale
import requests
import sys
from io import BytesIO

accessToken = \
    '{YOUR ACCESS TOKEN HERE}'
fileData = \
    BytesIO(requests.get('https://upload.wikimedia.org/wikipedia/commons/c/cf/Alhambra_evening_panorama_Mirador_San_Nicolas_sRGB-1.jpg'
            ).content).getvalue()
fileSize = sys.getsizeof(fileData) - 129

# Step I - Chop data into chunks
wholeSize = fileSize
chunkSize = 4980736  # Almost 5 MB
chunkTally = 0
chunkData = []
while wholeSize > 0:
    if (chunkTally + 1) * chunkSize > fileSize:
        chunkData.append(fileData[chunkTally * chunkSize:fileSize])
    else:
        chunkData.append(fileData[chunkTally * chunkSize:(chunkTally
                         + 1) * chunkSize])
    wholeSize -= chunkSize
    chunkTally += 1

# Step II - Initiate resumable upload
headers = {'Authorization': 'Bearer ' + accessToken,
           'Content-Type': 'application/json'}
parameters = {'name': 'alhambra.jpg',
          'description': 'Evening panorama of Alhambra from Mirador de San Nicol\xc3\xa1s, Granada, Spain.'}
r = \
    requests.post('https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable'
                  , headers=headers, data=json.dumps(parameters))
location = r.headers['location']

# Step III - File upload
chunkTally = 0
for chunk in chunkData:
    if (chunkTally + 1) * chunkSize - 1 > fileSize - 1:
        finalByte = fileSize - 1
        chunkLength = fileSize - chunkTally * chunkSize
    else:
        finalByte = (chunkTally + 1) * chunkSize - 1
        chunkLength = chunkSize
    headers = {'Content-Length': str(chunkLength),
               'Content-Range': 'bytes ' + str(chunkTally * chunkSize) \
               + '-' + str(finalByte) + '/' + str(fileSize)}
    r = requests.put(location, headers=headers, data=chunk)
    print(r.text)  # Response
    chunkTally += 1

As an example, this script will take a photo from Wikimedia Commons; you can use your file stream instead. After getting the data the code will calculate the file size based on the memory space used by the variable (because it isn't written on the hard drive).

The next step is to chop the file into chunks smaller than 5 MB. I made sure to use a multiple of 1024*256 as is detailed on the docs. The data will be iterated until it's divided in almost 5 MB chunks (except the final one).

After that operation the code will initialize a resumable upload as documented using OAuth 2.0 for authentication. In this step I used some example metadata for my file, but you can read on Files properties about other ones. Finally, the script will save the location for future uploads in a variable.

In the final step the chunks will be iterated and uploaded one by one. First a header is built based on the specifications. After that, we already have the header, chunk and upload location ready, so we can proceed to formalize the upload in a request. After every chunk is uploaded the response will be printed to log errors and, after the final chunk, to show the metadata of the uploaded file. This marks the end of the complete operation. As a final note I want to mention that I wrote and tested this script in Python3. If you have any doubts, please don't hesitate to ask me for some clarifications.

Upvotes: 1

Related Questions