Reputation: 10843
I'm trying to download a Google doc to PDF or Sheet to XLS given an ID programmatically from the CLI.
Steps I've tried so far:
I'm a bit stuck now and I am not sure how to proceed. Any suggestions?
Upvotes: 2
Views: 231
Reputation: 388
Warning: hopefully informative wall of text ahead! I've also uploaded the full Jupyter Notebook for you to clone and run here since, as you've realized, putting this sort of stuff together can be challenging.
Since we're going to be exporting files via the google drive API, we need credentials for that scope as detailed in https://developers.google.com/drive/api/v3/reference/files/export#auth.
However, first we need to choose an authentication method as detailed in https://developers.google.com/identity/protocols/oauth2#scenarios.
Since you mentioned creating a GCP project, I assume you're interested in using a GCP service account as detailed in https://developers.google.com/identity/protocols/oauth2#serviceaccount
You can create a service account at https://console.developers.google.com/apis/credentials or as explained in https://developers.google.com/identity/protocols/oauth2/service-account#creatinganaccount
Make sure to enable domain-wide-delegation for that service account while creating it and grant it https://www.googleapis.com/auth/drive
scope under https://admin.google.com/ac/owl/domainwidedelegation since you otherwise won't be able to impersonate other users, including yourself, and download their files.
We then use the SERVICE_ACCOUNT_FILE
we just downloaded and the SCOPES
we defined to create a Credentials object.
However, you'll need to first install the Python bindings for the Google API as per https://developers.google.com/drive/api/v3/quickstart/python (pip3 install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
)
With that, the following should be enough to authenticate to the API:
from googleapiclient.discovery import build
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/drive']
SERVICE_ACCOUNT_FILE = 'credentials.json'
credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE,
scopes=SCOPES)
# Remember, you must have created credentials.json with domain-wide delegation!
credentials = credentials.with_subject('[email protected]')
# We then build a drive_v3 service using the credentials we just created
service = build('drive', 'v3', credentials=credentials)
We can access the files
resource as shown in https://developers.google.com/drive/api/v3/reference/files/get and request the metadata of a file to which [email protected]
has access https://docs.google.com/document/d/fileId/edit. In your case fileId=141g8UkQfdMQSTfIn475gHj1ezZVV16f5ONDxpWrrvts
.
files = service.files()
print(service.files().get(fileId='1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8').execute())
{'kind': 'drive#file', 'id': '1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8', 'name': 'empty', 'mimeType': 'application/vnd.google-apps.document'}
We access the files resource again but this time to export the file as detailed in https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/drive_v3.files.html#export This could also be achieved using https://developers.google.com/drive/api/v3/manage-downloads.
Valid MIME types are listed in https://developers.google.com/drive/api/v3/ref-export-formats.
fconr = files.export(fileId='1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8',
mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
fcont = fconr.execute()
print('{}...'.format(fcont[:10]))
file = open("/tmp/sample.doc", "wb")
file.write(fcont)
file.close()
b'MN\xc30\x10\x85O\xc0\x1d"'...
As you can see, fcont
contains a binary blob that corresponds to the document and of which I'm showing the first 10 bytes. Finally, the blob is saved to sample.doc
.
ls -alh1 /tmp/sample.doc
-rw-rw-r-- 1 jdsalaro jdsalaro 6,0K Jan 20 23:38 /tmp/sample.doc
As mentioned above, I encourage you to experiment with the Jupyter notebook once you've created the service account with domain-wide delegation, have saved it to credentials.json
and have granted it the https://www.googleapis.com/auth/drive
scope.
Upvotes: 1