guerreiro
guerreiro

Reputation: 93

GAE - processing external URI

I am trying to process URIs on GAE flexible, specifically I am processing pdf files through pdf2image. Whenever I use the URI on pdf2image's convert_from_path, GAE throws

File not Found

, but if I do the same process on my local machine it is executed with no errors. Should I set something up on Google App Engine to allow it?

Upvotes: 1

Views: 153

Answers (1)

Alex
Alex

Reputation: 5276

where is this pdf?

Your title says 'external URI' but pdf2image's docs for convert_from_path seems to indicate that this file is sitting in your code.

If the file is indeed sitting in your project code and getting deployed with your project, try using this to convert a relative path to an absolute one:

curr_dir = os.path.dirname(os.path.realpath(__file__))
images = convert_from_path(curr_dir+'/my/relative/path/example.pdf')

Edit:

For pdfs on GCS, I would handle the download from gcs separately and then use convert_from_bytes instead of convert_from_path

You'd setup your connection to GCS like this:

https://cloud.google.com/appengine/docs/flexible/python/using-cloud-storage

Use this function to get the GCS blob:

https://googlecloudplatform.github.io/google-cloud-python/latest/storage/buckets.html#google.cloud.storage.bucket.Bucket.get_blob

And then use this function to actually download the bytes:

https://googlecloudplatform.github.io/google-cloud-python/latest/storage/blobs.html#google.cloud.storage.blob.Blob.download_as_string

So something like this:

client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.get_blob('/path/to/blob.pdf')
bytes = blob.download_as_string()
images = convert_from_bytes(bytes)

Upvotes: 4

Related Questions