user1905455
user1905455

Reputation: 3

how to perform query expansion

I am working on a C# application where the user provides a set of words ( typically less than 10) and I need to retrieve all the synonyms of these words. This is my first time working with dictionary and these stuff. I need to know the steps to follow and if there an existing dictionary that provides synonyms that I can integrate with my application or if there is an open source application or code that I can use.

Upvotes: 0

Views: 650

Answers (2)

Pete Garafano
Pete Garafano

Reputation: 4913

To answer your first question. You can find a thesaurus download here: http://wordpresscloaker.com/blog/download-free-english-thesaurus-format-txt.html

I make no promises to the quality, accuracy, legality, licensing for use, or completeness of that file. However, this will get you on your way. You need to extract the mthesaur.txt and add it to your project folder.

Next, you need to read in the text file by doing the following:

var reader = new StreamReader(File.OpenRead(@"C:\mthesaur.txt"));
var dict = new Dictionary<string, string>();
while (!reader.EndOfStream)
{
    // Read the file line by line.
    var line = reader.ReadLine();

    // If the line isn't null, we can use it.  This shouldn't happen but it is a good sanity check.
    if (line == null) continue;
    // Split the line by the delimiter (a comma) so we can get the main word, the first one on the line.
    var splitLine = line.Split(',');
    var mainWord = splitLine[0];
    // To save us from having to loop through and only get the indexes above 0 (eg, skip the main word) we will just simply remove it from the line so we have just synonyms.
    line = line.Replace(mainWord + ",", string.Empty);
    // Now we make use of the dictionary type in C# and add the mainword as the key and the synonyms as the value.
    try
    {
        dict.Add(mainWord, line);
    }
    catch (ArgumentException argEx)
    {
        Console.WriteLine("Attempted to add {0} to the dictionary but it already exists.", mainWord);
    }
}

Now that we have everything in a key/value dictionary in C#, you can use LINQ to query out the synonyms for an entered word. This can be done by either using a drop down that contains all the key values from the dictionary (not recommended as this will be an extremely large drop down and hard to navigate for the user), a ListBox (better, easier to navigate), or a plain text search box. While this doesn't completely answer your question as there is nothing here about handling a GUI for the user, this should get you well on your way.

Upvotes: 1

Bogdan Gavril MSFT
Bogdan Gavril MSFT

Reputation: 21448

If you use SQL full text search or the underlying technology - Microsoft Search Server (there is a free Express SKU) you will find thesaurus for multiple languages and other natural language processing tools. I am of course assuming you are working on an actual project, not on homework...

If you are more into open source, check out Lucene.net - it provides a search engine and I'm pretty sure it has thesaur

Upvotes: 0

Related Questions