Reputation: 255
I am looking for a tool/api in .net, which can roughly extract the key words in a sentence. For example, if i have a article with title "PIX: World's thinnest 15-inch laptop, Dell XPS 15z", i want to extract keyword(s), e.g. DELL, XPS 15z, laptop etc. so that i can search those keywords in other articles and present the user with similar articles.
Any suggestions are appreciated.
Upvotes: 1
Views: 4019
Reputation: 1
I have been looking for this kind of tool also, and I found this page http://termcoord.wordpress.com/about/testing-of-term-extraction-tools/free-term-extractors/
You can choose from any of the tools. This helps me a lot of options.
Upvotes: 0
Reputation: 41
You could also use grouping in regular expressions to extract the words around Dell.
Upvotes: 0
Reputation: 6805
If you want to do search of text and present related articles, you might well be interested in Lucene.NET. It will index a body of text and accept standard search engine-style queries. It will even do Google-style presentation of search results, such as highlighting the search terms found in the document.
It is more work than using the algorithms Tarkus mentioned, but it will solve more of your problems and save you from having to write your own search engine (which is a non-trivial task).
Upvotes: 0
Reputation: 38378
Take a look here:
Upvotes: 2