Reputation: 7004
I have been using Google Cloud Functions (GCF) to setup a serverless environment. This works fine and it covers most of the required functionality that I need.
However, for one specific module, extracting data from FTP servers, the duration of parsing the files from a provider takes longer than 540s. For this reason, the task that I execute gets timed out when deploying it as a cloud function.
In addition, some FTP servers require that they whitelist an ip address that is making these requests. When using Cloud functions, unless you reserve somehow a static address or a range, this is not possible.
I am therefore looking for an alternative solution to execute a Python script in the cloud on the Google platform. The requirements are:
What is the best option out there for these kind of needs?
Upvotes: 0
Views: 391
Reputation: 222
Here is how I download files from FTP with Google Cloud Functions to Google Cloud Storage. It takes less than 30 secs (depending on the file size).
#import libraries
from google.cloud import storage
import wget
def importFile(request):
#set storage client
client = storage.Client()
# get bucket
bucket = client.get_bucket('BUCKET-NAME') #without gs://
blob = bucket.blob('file-name.csv')
#See if file already exists
if blob.exists() == False:
#copy file to google storage
try:
link = 'ftp://account:[email protected]/folder/file.csv' #for non-public ftp files
ftpfile = wget.download(link, out='/tmp/destination-file-name.csv') #save downloaded file in /tmp folder of Cloud Functions
blob.upload_from_filename(ftpfile)
print('Copied file to Google Storage!')
#print error if file doesn't exists
except BaseException as error:
print('An exception occurred: {}'.format(error))
#print error if file already exists in Google Storage
else:
print('File already exists in Google Storage')
Upvotes: 1
Reputation: 15266
The notion of a Cloud Function is primarily that of a Microservice ... something that runs for a relatively short period of time. In your story, we seem to have actions that can run for an extended period of time. This would seem to lend itself to the notion of running some form of compute engine. The two that immediately come to mind are Google Compute Engine (CE) and Google Kubernetes Engine (GKE). Let us think about the Compute Engine. Think of this as a Linux VM where you have 100% control over it. This needn't be a heavyweight thing ... Google provides micro compute engines which are pretty darn tiny. You can have one or more of these including the ability to dynamically scale out the number of instances if load on the set becomes too high. On your compute engine, you can create any environment you wish ... including installing a Python environment and running Flask (or other) to process incoming requests. You can associate your compute engine with a static IP address or associate a static IP address with a load balancer front-ending your engines.
Upvotes: 2