dandev91
dandev91

Reputation: 1731

AWS Lambda (Python) - Downloading file from internet and upload directly to AWS S3

I am trying to download a file from an API to upload (stream) directly into S3.

My code for local downloads (which works perfectly):

import requests
import datetime
import os

headers = {'Authorization': 'apikey THISISHIDDEN'}
baseURL = 'https://api.test.au/busschedule/'
target_path = datetime.datetime.now().strftime('%Y-%m-%d schedule') + '.zip'

response = requests.get(baseURL, stream=True, headers=headers)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
    if chunk:  # filter out keep-alive new chunks
        handle.write(chunk)
handle.close()

My attempt to download and stream to S3 (which didn't work):

# import requests
import datetime
import os
import boto3
import botocore.vendored.requests.packages.urllib3 as urllib3

# Get environment variables from serverless.yml
bucket = "bucket"  
s3folder = "schedules"

# Set standard script parameters
headers = {'Authorization': 'apikey THISISHIDDEN'}
baseURL = 'https://api.test.au/busschedule/'


def run(event, context):
    s3 = boto3.client('s3')
    datetimestamp = datetime.datetime.today().strftime('%Y%m%dT%H%M%S')
    filename = datetimestamp + " bus schedule.zip"
    key = s3folder + '/' + filename  # your desired s3 path or filename
    http = urllib3.PoolManager()
    s3.upload_fileobj(http.request('GET', baseURL,
                                   headers=headers, preload_content=False),
                                   bucket, key)


def main():
  run({},{})


if __name__ == "__main__":
        exit(main())

The error I get returned by CloudWatch is:

InsecureRequestWarning: Unverified HTTPS request is being made.  Timeout after 300.10s.

EDIT: The lambda function has a timeout of 300 seconds; but this should be more than long enough to download the file (6mb). Downloading locally is done within 10 or so seconds. Does anyone have a better approach to this?

Upvotes: 4

Views: 10292

Answers (2)

bhave7
bhave7

Reputation: 106

Another way of uploading file (even larger than 6MB) using AWS lambda:

Step 1: Create a pre-signed URL based on the get or put request, return this url as the response. Step 2: Use this URL in your file uploader class on UI.

Pros

  1. Code is scalable in terms of file size since lambda has limit of 6MB, this logic works with lesser and higher data.
  2. Pre-Signed URL is secure way, we can provide the time limit for the url after which the url is expired.

Cons

  1. Its a 2 step process, so need to hit 2 APIs to upload single file.

Please feel free to give feedback for this approach.

Upvotes: 0

dandev91
dandev91

Reputation: 1731

Resolved this issue using the 'smart_open' library:

response = requests.get(baseURL, stream=True, headers=headers)
s3url = 's3://' + bucket + '/' + key
with smart_open(s3url, 'wb') as fout:
    fout.write(response.content)

I have another issue to resolve (Lambda permissions) but this will be a separate question. Running this locally worked a treat.

Upvotes: 1

Related Questions