sujeet14108
sujeet14108

Reputation: 568

QA query system on corpus

We have a question - answer corpus like shown below

Q: Why did Lincoln issue the Emancipation Proclamation? 
A: The goal was to weaken the rebellion, which was led and controlled by slave owners.

Q: Who is most noted for his contributions to the theory of molarity and molecular weight?  
A: Amedeo Avogadro

Q: When did he drop John from his name? 
A: upon graduating from college

Q: What do beetles eat? 
A: Some are generalists, eating both plants and animals. Other beetles are highly specialised in their diet.


Consider question as queries and answers as documents.
We have to build a system that for a given query (semantically similar to one of the questions in the question corpus) be able to get the right document (answers in the answer corpus)
Can anyone suggest any algorithm or good way to proceed in building it.

Upvotes: 1

Views: 690

Answers (2)

Wasi Ahmad
Wasi Ahmad

Reputation: 37681

Your question is too broad and the task you are trying to do is challenging. However, I suggest you to read about IR-based Factoid Question Answering. This document has reference to many state-of-art techniques. Reading this document should lead you to several ideas.

Please note that, you need to follow different approach for IR-based Factoid QA and knowledge-based QA. First, identify what type of QA system you want to build.

Lastly, I believe simple document matching technique for QA won't be enough. But you can try simple approach using Lucene @Debasis suggested and see whether it does well.

Upvotes: 3

Debasis
Debasis

Reputation: 3740

Consider a question and its answer (assuming there is only one) as one single document in Lucene. Lucene supports a field view of documents; so while constructing a document make question the searchable field. Once you retrieve the top ranked questions given a query question, use the get method of the Document class to return the answers.

A code skeleton (fill this up yourself):

//Index
IndexWriterConfig iwcfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(...);
....
Document doc = new Document();
doc.add(new Field("FIELD_QUESTION", questionBody, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("FIELD_ANSWER", answerBody, Field.Store.YES, Field.Index.ANALYZED));
...
...
// Search
IndexReader reader = new IndexReader(..);
IndexSearcher searcher = new IndexSearcher(reader);
...
...
QueryParser parser = new QueryParser("FIELD_QUESTION", new StandardAnalyzer());
Query q = parser.parse(queryQuestion);
...
...
TopDocs topDocs = searcher.search(q, 10); // top-10 retrieved
// Accumulate the answers from the retrieved questions which
// are similar to the query (new) question.
StringBuffer buff = new StringBuffer();
for (ScoreDoc sd : topDocs.scoreDocs) {
    Document retrievedDoc = reader.document(sd.doc);
    buff.append(retrievedDoc.get("FIELD_ANSWER")).append("\n");
}
System.out.println("Generated answer: " + buff.toString());

Upvotes: 0

Related Questions