Landon G
Landon G

Reputation: 839

Natural Language Processing techniques for understanding contextual words

Take the following sentence:

I'm going to change the light bulb

The meaning of change means replace, as in someone is going to replace the light bulb. This could easily be solved by using a dictionary api or something similar. However, the following sentences

I need to go the bank to change some currency

You need to change your screen brightness

The first sentence does not mean replace anymore, it means Exchangeand the second sentence, change means adjust.

If you were trying to understand the meaning of change in this situation, what techniques would someone use to extract the correct definition based off of the context of the sentence? What is what I'm trying to do called?

Keep in mind, the input would only be one sentence. So something like:

Screen brightness is typically too bright on most peoples computers.
People need to change the brightness to have healthier eyes.

Is not what I'm trying to solve, because you can use the previous sentence to set the context. Also this would be for lots of different words, not just the word change.

Appreciate the suggestions.

Edit: I'm aware that various embedding models can help gain insight on this problem. If this is your answer, how do you interpret the word embedding that is returned? These arrays can be upwards of 500+ in length which isn't practical to interpret.

Upvotes: 3

Views: 796

Answers (3)

cookiemonster
cookiemonster

Reputation: 2054

Pretrained language models like BERT could be useful for this as mentioned in another answer. Those models generate a representation based on the context.

The recent pretrained language models use wordpieces but spaCy has an implementation that aligns those to natural language tokens. There is a possibility then for example to check the similarity of different tokens based on the context. An example from https://explosion.ai/blog/spacy-transformers

import spacy
import torch
import numpy

nlp = spacy.load("en_trf_bertbaseuncased_lg")
apple1 = nlp("Apple shares rose on the news.")
apple2 = nlp("Apple sold fewer iPhones this quarter.")
apple3 = nlp("Apple pie is delicious.")
print(apple1[0].similarity(apple2[0]))  # 0.73428553
print(apple1[0].similarity(apple3[0]))  # 0.43365782

Upvotes: 0

polm23
polm23

Reputation: 15593

What you're trying to do is called Word Sense Disambiguation. It's been a subject of research for many years, and while probably not the most popular problem it remains a topic of active research. Even now, just picking the most common sense of a word is a strong baseline.

Word embeddings may be useful but their use is orthogonal to what you're trying to do here.

Here's a bit of example code from pywsd, a Python library with implementations of some classical techniques:

>>> from pywsd.lesk import simple_lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> answer = simple_lesk(sent, ambiguous, pos='n')
>>> print answer
Synset('depository_financial_institution.n.01')
>>> print answer.definition()
'a financial institution that accepts deposits and channels the money into lending activities'

The methods are mostly kind of old and I can't speak for their quality but it's a good starting point at least.

Word senses are usually going to come from WordNet.

Upvotes: 2

Sushant
Sushant

Reputation: 469

I don't know how useful this is but from my POV, word vector embeddings are naturally separated and the position in the sample space is closely related to different uses of the word. However like you said often a word may be used in several contexts.


To Solve this purpose, generally encoding techniques that utilise the context like continuous bag of words, or continous skip gram models are used for classification of the usage of word in a particular context like change for either exchange or adjust. This very idea is applied in LSTM based architectures as well or RNNs where the context is preserved over input sequences.


The interpretation of word-vectors isn't practical from a visualisation point of view, but only from 'relative distance' point of view with other words in the sample space. Another way is to maintain a matrix of the corpus with contextual uses being represented for the words in that matrix. In fact there's a neural network that utilises bidirectional language model to first predict the upcoming word then at the end of the sentence goes back and tries to predict the previous word. It's called ELMo. You should go through the paper.ELMo Paper and this blog


Naturally the model learns from representative examples. So the better training set you give with the diverse uses of the same word, the better model can learn to utilise context to attach meaning to the word. Often this is what people use to solve their specific cases by using domain centric training data.

I think these could be helpful: Efficient Estimation of Word Representations in Vector Space

Upvotes: 1

Related Questions