Reputation: 506
I worked on this question for 2 days and found no direct answers on the web or SO, so I thought it would be prudent to do a Q&A here now that I've solved it.
Essentially, I have Python code running in an AWS Lambda that collects a bunch of data, processes them, and generates an Excel file with the info that my team needs. I need to push this file out to Google Drive (to a shared folder) so that the team can all see the info.
The problem was that I was trying to do it using MediaFileUpload from the Google API. This method takes in a filename as a string, like this (3rd to last line):
def upload_file_to_gdrive(folder_id, filename, FOLDER):
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.file', 'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/drive.appdata']
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists(FOLDER+'token.pickle'):
with open(FOLDER+'token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
print('creds didnt exist or were invalid')
if creds and creds.expired and creds.refresh_token:
print('creds were expired')
creds.refresh(gRequest())
else:
with open(FOLDER+'credentials.json', 'rb') as f:
flow = InstalledAppFlow.from_client_secrets_file(f, SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open(FOLDER+'token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds, cache_discovery=False)
###filename is something like '2020-07-02_data.xlsx'
###path is something like 'c:/datafolder/'
file_info = MediaFileUpload(FOLDER+filename, mimetype='application/vnd.google-apps.spreadsheet')
upload_metadata = {'name': filename, 'parents': [folder_id], 'mimeType': 'application/vnd.google-apps.spreadsheet'}
return service.files().create(body=upload_metadata, media_body=file_info, fields='id').execute()
Well, that works fine for local files on my computer, but how do I do this for s3 files? I cannot pass "s3://mybucket/my_file.xlsx" to the MediaFileUpload() method.
Upvotes: 0
Views: 939
Reputation: 506
In order to solve this, I just needed to use the MediaIoBaseUpload() method from the Google Drive API, instead. The trick was reading the xlsx file contents in from the s3 file, pulling that into BytesIO, and then pushing it into the Google Drive MediaIoBaseUpload method. Then it all worked.
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request as gRequest
from apiclient.http import MediaFileUpload, BytesIO, MediaIoBaseUpload
def upload_file_to_gdrive(folder_id, filename, CRED_FOLDER, S3_FOLDER):
s3 = s3fs.S3FileSystem(anon=False,key=<AWS_KEY>,secret=<AWS_SECRET>)
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.file', 'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/drive.appdata']
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists(CRED_FOLDER+'token.pickle'):
with s3.open(CRED_FOLDER+'token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
print('creds didnt exist or were invalid')
if creds and creds.expired and creds.refresh_token:
print('creds were expired')
creds.refresh(gRequest())
else:
with s3.open(CRED_FOLDER+'credentials.json', 'rb') as f:
flow = InstalledAppFlow.from_client_secrets_file(f, SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with s3.open(CRED_FOLDER+'token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds, cache_discovery=False)
### filename is still like '2020-07-02_site_ads_txt.xlsx'
### but S3_FOLDER is a valid s3 bucket path
with s3.open(S3_FOLDER+filename, 'rb') as f:
fbytes = BytesIO(f.read())
media = MediaIoBaseUpload(fbytes, mimetype='application/vnd.google-apps.spreadsheet')
upload_metadata = {'name': filename, 'parents': [folder_id], 'mimeType': 'application/vnd.google-apps.spreadsheet'}
return service.files().create(body=upload_metadata, media_body=media, fields='id').execute()
Upvotes: 2