batman
batman

Reputation: 703

knowledge based Q-A system not giving most appropriate answer

I am working on a project which is basically a knowledge based question answering system. My system takes query from the user, download the relevant documents from Wikipedia, strips all the html tags and extracts the plain text. After this, it tokenizes the document into sentences, then forms the term-document(TD) matrix(The query is also passed as a sentence). This TD matrix is then forwarded to pLSA(Probabilistic Latent Symentic Analysis) algorithm. Then, finally calculates the cosine similarity among the document(sentence) vectors with query vector. Based on the similarity with the query vector, the most relevant sentence is displayed as the answer. (Stemming is also done at the formation of TD Matrix). The problem is that is does displays the result, but not the most relevant. Where am I going wrong? Is the strategy I am following is correct, or any other algorithm does exists that may help?? Below I show some of the Question and their answers as returned by my system :

What is photosynthesis?
ANSWER  1 :   The stroma contains stacks (grana) of thylakoids, which are the site of photosynthesis 

ANSWER  2 :   Factors leaf is the primary site of photosynthesis in plants 

ANSWER  3 :   Samuel Ruben and Martin Kamen used radioactive isotopes to determine that the oxygen liberated in photosynthesis came from the water 

ANSWER  4 :   In plants, algae and cyanobacteria, photosynthesis releases oxygen 

Another question

What is Artificial Intelligence?
ANSWER  1 :   the problem of creating 'artificial intelligence' will substantially be solved" 

ANSWER  2 :   37 The leading-edge definition of artificial intelligence research is changing over time 

ANSWER  3 :   Stories of these creatures and their fates discuss many of the same hopes, fears and ethical concerns that are presented by artificial intelligence 

ANSWER  4 :   History of artificial intelligence and Timeline of artificial intelligence Thinking machines and artificial beings appear in Greek myths , such as Talos of Crete , the bronze robot of Hephaestus , and Pygmalion's Galatea 13 Human likenesses believed to have intelligence were built in every major civilization 

Another question

Who is a hacker?

ANSWER  1 :   19 Hackers (short stories) Helba from the  

ANSWER  2 :   16 Rafael Núñez aka RaFa was a notorious most wanted hacker by the FBI since 2001 

ANSWER  3 :   Often, this type of 'white hat' hacker is called an ethical hacker 
ANSWER  4 :   Hackers also commonly use port scanners  

yet another run

What is biology?
ANSWER  1 :   Molecular biology is the study of biology at a molecular level 

ANSWER  2 :   molecular biology studies the complex interactions of systems of biological molecules 

ANSWER  3 :   The similarities and differences between cell types are particularly relevant to molecular biology 

ANSWER  4 :   Contents History Foundations of modern biology 2 

Upvotes: 3

Views: 1226

Answers (2)

John Lehmann
John Lehmann

Reputation: 8225

This is a well studied problem called Question Answering (QA). I have provided a summary about QA in another answer. In particular, all of your examples would fall under the category of "definition questions", according to TREC. I suggest perusing some of the papers resulting from a query of "TREC definition questions" on Google or Google Scholar for ideas.

Upvotes: 2

Blacksad
Blacksad

Reputation: 15422

I think that it will be difficult to improve your system if you keep a full statistical approach. From a statistical NLP standpoint, you really do the right things. Now, you may fine-tune some parameters. To do that, you must build a training corpus by telling the system which answer is the right one... and then see which value the parameter has to take to give you this answer.

That being said, I don't think that fine-tuning parameters will improve your accuracy by more than 20% ~30%.

If you want to go further, you'll need a more semantic approach, and represent knowledge symbolically. Check for instance http://www.jfsowa.com/

Upvotes: 1

Related Questions