Reputation: 1731
I am trying to download a file from an API to upload (stream) directly into S3.
My code for local downloads (which works perfectly):
import requests
import datetime
import os
headers = {'Authorization': 'apikey THISISHIDDEN'}
baseURL = 'https://api.test.au/busschedule/'
target_path = datetime.datetime.now().strftime('%Y-%m-%d schedule') + '.zip'
response = requests.get(baseURL, stream=True, headers=headers)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
handle.write(chunk)
handle.close()
My attempt to download and stream to S3 (which didn't work):
# import requests
import datetime
import os
import boto3
import botocore.vendored.requests.packages.urllib3 as urllib3
# Get environment variables from serverless.yml
bucket = "bucket"
s3folder = "schedules"
# Set standard script parameters
headers = {'Authorization': 'apikey THISISHIDDEN'}
baseURL = 'https://api.test.au/busschedule/'
def run(event, context):
s3 = boto3.client('s3')
datetimestamp = datetime.datetime.today().strftime('%Y%m%dT%H%M%S')
filename = datetimestamp + " bus schedule.zip"
key = s3folder + '/' + filename # your desired s3 path or filename
http = urllib3.PoolManager()
s3.upload_fileobj(http.request('GET', baseURL,
headers=headers, preload_content=False),
bucket, key)
def main():
run({},{})
if __name__ == "__main__":
exit(main())
The error I get returned by CloudWatch is:
InsecureRequestWarning: Unverified HTTPS request is being made. Timeout after 300.10s.
EDIT: The lambda function has a timeout of 300 seconds; but this should be more than long enough to download the file (6mb). Downloading locally is done within 10 or so seconds. Does anyone have a better approach to this?
Upvotes: 4
Views: 10292
Reputation: 106
Another way of uploading file (even larger than 6MB) using AWS lambda:
Step 1: Create a pre-signed URL based on the get or put request, return this url as the response. Step 2: Use this URL in your file uploader class on UI.
Pros
Cons
Please feel free to give feedback for this approach.
Upvotes: 0
Reputation: 1731
Resolved this issue using the 'smart_open' library:
response = requests.get(baseURL, stream=True, headers=headers)
s3url = 's3://' + bucket + '/' + key
with smart_open(s3url, 'wb') as fout:
fout.write(response.content)
I have another issue to resolve (Lambda permissions) but this will be a separate question. Running this locally worked a treat.
Upvotes: 1