Jesvin Jose
Jesvin Jose

Reputation: 23088

Tools for parsing natural language questions in realtime

photos in washington VS show me photos in washington VS I wanna see all my photos in washington taken day before yesterday

what:photos
entities:washington (dont want to be too assuming)
when: 2013-03-14

I want to parse preset queries into conditions (like above). I want these qualities:

  1. I can extract relevant terms even in presence of fluff ("I wanna see) and lowercase nouns
  2. warm program can accept requests over HTTP or allow me to add some network communication
  3. warm program responds in 50ms and needs atmost 500Mb of memory for reasonable sentences
  4. I am more experienced in Python, less so in Java
  5. Parser data structure is easy to handle

I use NLTK, but its slow. I see StanfordNLP and OpenNLP as viable alternatives but I find the program-start latency to be too high. I dont mind integrating them over servlets if I am left with no alternative.

Upvotes: 0

Views: 1309

Answers (1)

AaronD
AaronD

Reputation: 1701

The Stanford Parser is a solid choice, and pretty well-supported (as research code goes). But it sounds like low latency is an important requirement for you, so I'd also suggest you look at the BUBS Parser (full disclosure - I'm one of the primary researchers working on BUBS).

I haven't compared directly to NLTK, but I think you may find that the Stanford Parser doesn't meet your performance needs. This paper found a total throughput of ~60 words/second (~2-3 sentences/second). Those timings are pretty old, so newer hardware will certainly improve that, but probably still won't come close to 50 ms latency.

As you note, startup time will be an issue with any parser - a high-accuracy model is necessarily quite large. And 500 MB is probably pretty tight too (I usually run BUBS with 1-1.2 GB). But once loaded, BUBS latency is generally in the neighborhood of 10 ms per sentence (for ~20-25-word sentences), and we can push the total throughput up around 2500 words/second before accuracy starts to drop off. I think those numbers might meet your performance needs, and I don't know of any other high-accuracy (F1 >= 88-89) parser that comes close in speed.

Note: the fastest results are with recent pruning models that aren't yet posted to the website, but I can get you a model if you need. Hope that helps, and if you have more questions, feel free to ask.

Upvotes: 0

Related Questions