Remove negation token and return negated sentence in Spacy

Question

I want to use dependency parser of spaCy to determine the scope of negation within my docs. See here the dependency visualizer applied to the following string:

RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL

I am able to detect negation cues with

 negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']

As a result I see that not is the negation modifier of got in my string. Now I want to define the scope of the negation with the following:

negation_head_tokens = [token.head for token in negation_tokens]   
for token in negation_head_tokens:
    end = token.i
    start = token.head.i + 1
    negated_tokens = doc[start:end]
    print(negated_tokens)

This gives the following output:

 ooopen to Talk about patents with GOOG definitely not the treatment Samsung

Now I have defined the scope, I want to add "not" to certain words conditional on their POS-tag

list = ['ADJ', 'ADV', 'AUX', 'VERB']
for token in negated_tokens:
    for i in list:
        if token.pos_ == i:
            print('not'+token.text)

This gives the following:

 notooopen, notTalk, notdefinitely, notnot

I want to exclude notnot from my output and return

RT @trader $AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely the treatment Samsung got heh someurl

How can I achieve this? And do you see improvements in my script from a speed-perspective?

Full script:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL)
list = ['ADJ', 'ADV', 'AUX', 'VERB']

negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
negation_head_tokens = [token.head for token in negation_tokens]

for token in negation_head_tokens:
   end = token.i
   start = token.head.i + 1
   negated_tokens = doc[start:end]
   for token in negated_tokens:
      for i in list:
         if token.pos_ == i:
            print('not'+token.text)

Josh Friedlander · Accepted Answer

It's bad form to override Python built-ins like list - I renamed it pos_list.
Since "not" is just a regular adverb, it seems the simplest way to avoid it would be with an explicit blacklist. Maybe there is a more "linguistic" way to do it.
I slightly sped up your inner loop.

Code:

doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL')

pos_list = ['ADJ', 'ADV', 'AUX', 'VERB']
negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
blacklist = [token.text for token in negation_tokens]
negation_head_tokens = [token.head for token in negation_tokens]
new_doc = []

for token in negation_head_tokens:
    end = token.i
    start = token.head.i + 1
    left, right = doc[:start], doc[:end] 
    negated_tokens = doc[start:end]
for token in doc:
    if token in negated_tokens:
        if token.pos_ in pos_list and token.text not in blacklist:

        # or you can leave out the blacklist and put it here directly
        # if token.pos_ in pos_list and token.text not in [token.text for token in negation_tokens]:
            new_doc.append('not'+token.text)
            continue
        else:
            pass
    new_doc.append(token.text)
print(' '.join(new_doc))

> RT @trader $ AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely not the treatment Samsung got heh someURL

Remove negation token and return negated sentence in Spacy

Answers (1)

Related Questions