unaiherran
unaiherran

Reputation: 1034

How to serve a pdf file from GCS in GAE?

I'm using Google App Engine in Python to handle a small webapp.

I have some files stored in my GCS that I want to serve only if the user is logged in.

I though it was really easy, but for sure I'm missing a step since my code:

import cloudstorage as gcs

class Handler(webapp2.RequestHandler):
    def write(self, *a, **kw):
        self.response.out.write(*a, **kw)

class testHandler (Handler):
    def get (self):
        bucket = '/my_bucket'
        filename = '/pdf/somedoc.pdf'
        user = users.get_current_user()
        if user:
            pdf = gcs.open(bucket+filename)
            self.write(pdf)

only gives:

<cloudstorage.storage_api.ReadBuffer object at 0xfbb931d0>

and what I need is the file itself.

Anyone can tell me which is the step I'm missing?

Thanks

Upvotes: 0

Views: 1405

Answers (3)

Paul Liang
Paul Liang

Reputation: 776

Even the PO has answered his question, just want to add a few thoughts.

PO's code is to write the content of pdf file into http response.

self.write(pdf.read())

According to GAE quota limitation, if the response size is larger than 32MB, it will fail.

Also, it would be good to set the urlfetch_timeout value, as default value of 5 seconds may not be enough in some circumstance, and would result in DeadlineExceededError.

I would recommend to try, when a request is received, use Google Cloud Storage API ( Not the GAE one ) to copy the file to a temporary location. Also Make sure to set the acl of the new object as publicly readable, then serve the public url of the new object.

Also, send a request to a taskqueue, set eta of the task to a timeout value of your choice. Once the task is executed, remove the file from the temporary location, so that it could no longer be accessed.

UPDATE:

Use Service Account Auth, Generate a new JSON key, get the private key.

Set the scope to FULL_CONTROL as we need to change acl settings.

I havn't test the code yet as I am at work. But will do when i have time.

import httplib2
from apiclient.discovery import build
from apiclient.errors import HttpError
from oauth2client.client import SignedJwtAssertionCredentials


# Need to modify ACL, therefore need full control access
GCS_SCOPE = 'https://www.googleapis.com/auth/devstorage.full_control'


def get_gcs_client( project_id, 
                    service_account=None,
                    private_key=None):

    credentials = SignedJwtAssertionCredentials(service_account, private_key, scope=GCS_SCOPE)

    http = httplib2.Http()
    http = credentials.authorize(http)
    service = build('storage', 'v2', http=http)

    return service

Upvotes: 1

asamarin
asamarin

Reputation: 1594

I think you'd be better off using the BlobStore API on GCS to serve this kind of files. Based on Using the Blobstore API with Google Cloud Storage, I've come up with this approach:

import cloudstorage as gcs
import webapp2

from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers

GCS_PREFIX = '/gs'
BUCKET = '/my_bucket'
FILE = '/pdf/somedoc.pdf'
BLOBSTORE_FILENAME = GCS_PREFIX + BUCKET + FILE

class GCSWebAppHandler(webapp2.RequestHandler):
    def get(self):
        blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
        self.response.headers['Content-Type'] = 'application/pdf'
        self.response.write(blobstore.fetch_data(blob_key, 0, blobstore.MAX_BLOB_FETCH_SIZE - 1))

class GCSBlobDlHandler(blobstore_handlers.BlobstoreDownloadHandler):
    def get(self):
        blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
        self.send_blob(blob_key)

app = webapp2.WSGIApplication([
    ('/webapphandler', GCSWebAppHandler),
    ('/blobdlhandler', GCSServingHandler)],
    debug=True)

As you can see, there are two example handlers you can use here, webapphandler and blobdlhandler. It's probably better to use the latter, since the former is limited by MAX_BLOB_FETCH_SIZE in fetch_data() which is 1MB, but if your served files are smaller than this size, it's ok.

Upvotes: 0

unaiherran
unaiherran

Reputation: 1034

After some thinking, shower and coffee, I realized I had two problems.

First I was writing the address of the file, not the file.

So the correct call would be:

self.write(pdf.read())

Also, I had to change the 'Content-Type' header to 'application/pdf', to allow the browser to serve the file and not a text file.

Anyhow, the result was:

class pHandler(webapp2.RequestHandler):
    def write(self, *a, **kw):
        self.response.headers['Content-Type']='application/pdf'
        self.response.out.write(*a, **kw)

class testHandler (pHandler):
    def get (self):
        bucket = '/my_bucket'
        filename = '/pdf/somedoc.pdf'
        user = users.get_current_user()
        if user:
            pdf = gcs.open(bucket+filename)
            self.write(pdf.read())

Upvotes: 2

Related Questions