Ahimsa Afrizal
Ahimsa Afrizal

Reputation: 81

WordNet, Query Expansion, Step by Step

I want to make a project about query expansion using WordNet,but it's hard to find step by step method to do it.

Based on this article, I should take the following steps (assuming a sentence as input to the program):

  1. Tokenization
  2. Tagging part of speech
  3. Stemming word
  4. Word sense disambiguation
  5. Semantic similarity between the two synsets (it still confusing)

...and then we can conclude that the word with larger score is the query expansion from the input. However, I'm still confused about how to perform each of these steps. Is there any source which covers these in more detail?

Upvotes: 3

Views: 5871

Answers (2)

Siddhant Saurabh
Siddhant Saurabh

Reputation: 21

see if this resolves your problem, here tried wordnet for query expansion

def get_wordnet_pos(treebank_tag):
    if treebank_tag.startswith('J'):
        return 'a'
    elif treebank_tag.startswith('V'):
        return 'v'
    elif treebank_tag.startswith('N'):
        return 'n'
    elif treebank_tag.startswith('R'):
        return 'r'
    else:
        return None


def query_expansion_wordnet(query):
    words = word_tokenize(query)
    print("words",words)

    pos_tags = pos_tag(words)
    print("pos_tags", pos_tags)

    stemmer = PorterStemmer()
    stemmed_words = [stemmer.stem(word) for word, tag in pos_tags]
    print("stemmed_words", stemmed_words)


    expanded_queries = []
    for word, pos in zip(stemmed_words, pos_tags):
        pos = get_wordnet_pos(pos[1])
        print("************")
        print("word pos",word, pos)
        synsets = wordnet.synsets(word, pos=pos)
        if synsets:
            correct_synset = lesk(words, word, pos=pos)
            print("correct_synset", correct_synset)
            if correct_synset:
                max_similarity = 0
                most_similar_synset = None
                for synset in synsets:
                    similarity = correct_synset.path_similarity(synset)
                    print("similarity", synset, similarity)
                    if similarity and similarity > max_similarity:
                        max_similarity = similarity
                        most_similar_synset = synset
                if most_similar_synset:
                    expanded_queries.append(most_similar_synset.lemma_names())
            else:
                expanded_queries.append([word])
        else:
            expanded_queries.append([word])
    return expanded_queries

Upvotes: 0

Ram Narasimhan
Ram Narasimhan

Reputation: 22496

Query Expansion is a huge field in itself under IR (Information Retrieval).

Also, WordNet is by itself huge, and so it is difficult to find single step-by-step directions. However, there are tons of very good resources. I got started with it by taking several web examples and trying them out myself.

Resources you will find useful in getting started.

  1. The wordnet site itself (with examples)
  2. The WordNet Wikipedia page
  3. Python Programming.net has a WordNet tutorial page
  4. Even if you don't know Python, I would highly recommend the O'Reilly book "Natural Language Processing with Python". It's website has TONS of examples to get you started.

Hope that helps you get going.

Upvotes: 2

Related Questions