Reputation: 408
I am using SpaCy's en-core-web-sm
in my Python AWS Lambda. I ran pip freeze > requirements.txt
to get all the dependencies in the requirements.txt
file. en-core-web-sm==2.1.0
is one of the lines in the file.
When I try to make a serverless deployment, I get ERROR: Could not find a version that satisfies the requirement en-core-web-sm==2.1.0 (from versions: none) ERROR: No matching distribution found for en-core-web-sm==2.1.0
.
Even though I am not using Heroku, I followed Heroku Deployment Error: No matching distribution found for en-core-web-sm and added the line https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm==2.1.0
in my requirements.txt
file only to get Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: XxX-XxX)
How to wire up en-web-core-sm
to my Lambda?
Upvotes: 2
Views: 2361
Reputation: 722
Take the advantage of the model being a separate component to the library and uploaded the model in an S3 bucket. Before initialising spaCy, I download the model from S3. This is accomplished by the method below.
def download_dir(dist, local, bucket):
client = get_boto3_client('s3', lambda n: boto3.client('s3'))
resource = get_boto3_client('s3r', lambda n: boto3.resource('s3'))
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
if result.get('CommonPrefixes') is not None:
for subdir in result.get('CommonPrefixes'):
download_dir(subdir.get('Prefix'), local, bucket)
if result.get('Contents') is not None:
for file in result.get('Contents'):
if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
dest_path = local + os.sep + file.get('Key')
if not dest_path.endswith('/'):
resource.meta.client.download_file(bucket, file.get('Key'), dest_path)
And the code using spaCy looks like this:
import spacy
if not os.path.isdir(f'/tmp/en_core_web_sm-2.0.0'):
download_dir(lang, '/tmp', mapping_bucket)
spacy.util.set_data_path('/tmp')
nlp = spacy.load(f'/tmp/en_core_web_sm-2.0.0')
doc = nlp(spacy_input)
for token in doc:
print(token.text, token.pos_, token.label_)
Upvotes: 4