Matthew Lee
Matthew Lee

Reputation: 55

Tensorflow_hub load Memory limit on heroku

Is it possible to reduce the memory usage while loading in a tensorflow_hub model?
As of right now it reaches to limit of heroku's memory quota which is 512 mb.
Would it be possible to somehow split the loading? I've tried threading it and loading it in the background but that only solved the problem where the request would time out.

from flask import Flask, render_template, url_for, make_response,jsonify,request
import tensorflow_hub as hub
import numpy as np
import tensorflow as tf
import threading


app = Flask(__name__,template_folder='templates')




def semantic(search1,search2):
    comparison = model([search1,search2])
    return np.inner(comparison[0],comparison[1])

def task():
    module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
    model = hub.load(module_url)
    tf.keras.backend.clear_session()


@app.route('/')
def menu():
    threading.Thread(target=task).start()
    return render_template("index.html")

@app.route('/<search1>/<search2>',methods=['POST','GET'])
def deploy(search1,search2):
    compare = semantic(search1,search2)
    compare = compare*100
    compare = str(compare)
    compare = compare.strip("")
    response = {
        "Semantic Similarity": compare
    }
    if request.method == 'POST':
        return make_response(jsonify(response),200)
    else:
        return render_template("results.html",compare=compare,)
    

Thank you for taking a look at this thread, I've been looking for answers upon hours and hours but the only solution is to either migrate to another platform or just pay.

Upvotes: 0

Views: 454

Answers (1)

Andrey Khorlin
Andrey Khorlin

Reputation: 213

To my knowledge, unfortunately, there is not much one can do in this case. The amount of memory is dependent on the internal TensorFlow implementation of the logic responsible for loading weights and graph from their serialized representation. So you can try filing a feature request with TensorFlow to see if that logic can be improved to be more memory conscious.

Alternatively, when loading a model from tfhub.dev, the library copies the content to the local temporary directory. If this directory is memory-mapped, then changing the location of the cache to non-memory-mapped location may help. This can be done by setting TFHUB_CACHE_DIR environment variable.

Finally, if neither of these approaches work, then trying to use a different model that is smaller in size is an option as well.

Upvotes: 1

Related Questions