Reputation: 12141
I'm building a web application using NLTK and Flask. It's just a simple RESTful application I deployed it on heroku everything went well. However, when the server started getting more request I reached the memory limit from heroku which is 1.5GB. So, I'm guessing it's because I'm loading nltk.RegexpParser
every time the request comes.
This is the code which is really simple.
@app.route('/get_keywords', methods=['POST'])
def get_keywords():
data_json = json.loads(request.data)
text = urllib.unquote(data_json["sentence"])
keywords = KeywordExtraction().extract(text)
return ','.join(keywords)
And this is the keyword extraction bit.
import re
import nltk
nltk.data.path.append('./nltk_data/')
from nltk.corpus import stopwords
class KeywordExtraction:
def extract(self, text):
sentences = nltk.sent_tokenize(text)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]
grammar = "NP: {}"
cp = nltk.RegexpParser(grammar)
tree = cp.parse(sentences[0])
keywords = [subtree.leaves()[0][0] for subtree in tree.subtrees(filter=lambda t: t.node == 'NP')]
keywords_without_stopwords = [w for w in keywords if not w in stopwords.words('english')]
return list(set(keywords_without_stopwords + tags))
I'm not sure if it's the problem with my code or Flask or NLTK. I'm pretty new in Python. Any suggestions would be really appreciated.
I tested this by blitz.io and after just 250 requests the server blew up and started throwing R15.
Upvotes: 0
Views: 2548
Reputation: 298226
Start by caching things:
# Move these outside of the class declaration or make them class variables
stopwords = set(stopwords.words('english'))
grammar = "NP: {}"
cp = nltk.RegexpParser(grammar)
This can be sped up a little as well:
from itertools import ifilterfalse
...
keywords_without_stopwords = ifilterfalse(stopwords.__contains__, keywords)
return list(keywords_without_stopwords + set(tags)) # Can you cache `set(tags`)?
I'd also take a look at Flask-Cache in order to memoize and cache functions and views as much as possible.
Upvotes: 1