Arnold Klein
Arnold Klein

Reputation: 3086

Find negation of particular keywords in text

I am working on information extraction from medical texts (very new to NLP!). At the moment I am interested to find and extract the medications which are mentioned in a predefined list of drugs. For example, consider the text:

"John was prescribed aspirin due to hight temperature"

Thus, given the list of medications (in Python language):

list_of_meds = ['aspirin', 'ibuprofen', 'paracetamol']

The extracted drug is aspirin. That's fine.

Now consider another case:

"John was prescribed ibuprofen, because he could not tolerate paracetamol"

Now, if I extract the drugs using the list (for example with regular expression), then the extracted drugs are ibuprofen and paracetamol.

QUESTION How to separate actually prescribed and untolerated drugs? Is there a way to label prescribed (used) and other mentioned drugs?

Upvotes: 0

Views: 2332

Answers (2)

Adnan S
Adnan S

Reputation: 1882

This is a complex problem. To capture the nuances around negation, you need to step into the world of dependency parsing and relationship extraction. Couple of paths you can take to add sophistication to your current approach and the add-on by @Jordan are:

  1. Using a relationship extraction NLP library (e.g. Watson, Core-NLP, Spacy) that you train with example sentences like the one you gave to extract triplet relations like (John, prescribed, ibuprofen) and (John, not tolerate, paracetamol). This will require investment in annotating sample data.
  2. Rolling your own relationship extractor by starting with the dependency parse that shows how different parts of the sentence are related. This will require both programming time as well as training.

Handling negation in relations is not a solved problem. The state of the art around this is usually associated with sentiment analysis. An introduction on using dependency parsing to identify and handle negation is available at this Stanford NLP Sentiment Analysis using RNN page

Upvotes: 3

Jordan Mattiuzzo
Jordan Mattiuzzo

Reputation: 442

A way to overcome this would be predefining what word comes before the medicine name. So in your case, this would mean checking to see if either "prescribed" or "not tolerate" comes before the medicine name.

This is what I have come up with. Just replace the variable text = first with text = second if you want to try the second piece of text.

import string

list_of_meds = ['aspirin', 'ibuprofen', 'paracetamol']
first = "John was prescribed aspirin due to high temperature"
second = "John was prescribed ibuprofen, because he could not tolerate 
paracetamol"

text = first

for c in string.punctuation:                                                                                                     
    text = text.replace(c, "")
text = text.split(' ')
for i in text:
    if i in list_of_meds:
        index = text.index(i) - 1
        if text[index] == "prescribed":
            medicine = i
            break

Good luck!

Jordan.

----- EDIT -----

Use the variable medicine as the output, and you can use that variable from there.

Upvotes: 2

Related Questions