user2293224
user2293224

Reputation: 2220

Python: Extracting subject and its dependent phrases from text

I am trying to follow the thread (How to extract subjects in a sentence and their respective dependent phrases?). I also want to extract the subject and its dependent from the text.

import spacy
from textpipeliner import PipelineEngine, Context
from textpipeliner.pipes import *

text = 'No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.'

pipes_structure = [
    SequencePipe([
        FindTokensPipe("VERB/nsubj/*"),
        NamedEntityFilterPipe(),
        NamedEntityExtractorPipe()
    ]),
    FindTokensPipe("VERB"),
    AnyPipe([
        SequencePipe([
            FindTokensPipe("VBD/dobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("GPE"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ]),
        SequencePipe([
            FindTokensPipe("VBD/**/*/pobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("LOC"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ])
    ])
]

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
engine.process()

When I ran the above code it throws following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-5f5a5c9e8e51> in <module>()
----> 1 engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
      2 engine.process()

~/anaconda3/lib/python3.6/site-packages/textpipeliner/context.py in __init__(self, doc)
      4         self._current_sent_idx = -1
      5         self._paragraph = self._sents[0:9]
----> 6         for s in doc.sents:
      7             self._sents.append(s)
      8         self.doc = doc

AttributeError: 'str' object has no attribute 'sents'

I am not sure where am I making the mistake. Could anyone help in rectifying the issue?

Upvotes: 1

Views: 1149

Answers (2)

mazore
mazore

Reputation: 1024

It looks like you are passing in a string as the text variable into this line

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])

Replace line 4 with

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

as this is what they do in the post you referenced.

This way text is not a string, but it's whatever type the nlp function spits out, so it works in the 2nd to last line.

Upvotes: 1

Konrad Talik
Konrad Talik

Reputation: 916

Interesting library.

Your context needs to be a different object. The error says that explicitly. Check the package official example:

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

Upvotes: 1

Related Questions