Passing a large file to Celery for processing isn't working

Question

I want to save a file to AWS S3 and I am using Celery because I don't want to wait until the function finishes writing the file. The problem is when I send it to a Celery function I can see that it's not the same size in my AWS file storage compare to the actual file size.

this is when I am sending it to the Celery function:

file_to_put = str(file_to_put) # because you can't send an object to celery fun
write_file_aws.delay(file_full_name, file_to_put)

the Celery function itself:

@celery.task(name="write_file_to_aws")
def write_file_aws(file_full_name, file_to_put):
    file_to_put = bytearray(file_to_put)
    s3 = boto3.resource('s3')
    s3.Object(BUCKET, file_full_name).put(Body=file_to_put)
    return "Request sent!"

This is when the file size is smaller than what it should be (for e.g 1kb instead of 22kb in pictures is even 710kb instead of 230) and the file itself is just gibberish. Why would it happen? is it because of me turning it to a string? if it is what else can I do?

Nitin Nain · Accepted Answer

You're serializing a large file and passing it as an argument to the function. I assume you're using EC2. So you could instead store the file to the AWS EC2's instance storage or EBS first (they're faster to write to then S3). Then pass the "path" to this file as an argument to the Celery function call. The Celery worker will then copy the file to S3.

i.e. this:

def write_file_aws(file_full_name, file_to_put)

will become:

def write_file_aws(file_full_name, path_to_local_file)

Here's a primer on AWS EC2 storage options: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html

Passing a large file to Celery for processing isn't working

Answers (2)

Related Questions

Passing a large file to Celery for processing isn&#39;t working

Answers (2)

Related Questions

Passing a large file to Celery for processing isn't working