Parsing multiple sentences with MaltParser using NLTK

Question

There have been many MaltParser and/or NLTK related questions:

Now, there's a more stabilized version of MaltParser API in NLTK: https://github.com/nltk/nltk/pull/944 but there are issues when it comes to parsing multiple sentences at the same time.

Parsing one sentence at a time seems fine:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)

But parsing a list of sentences doesn't return a DependencyGraph object:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
 
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
  u'ctag': u'TOP',
  u'deps': [2],
  u'feats': None,
  u'lemma': None,
  u'rel': u'TOP',
  u'tag': u'TOP',
  u'word': None},
 {u'address': 1,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'I'},
 {u'address': 2,
  u'ctag': u'NN',
  u'deps': [1, 11],
  u'feats': u'_',
  u'head': 0,
  u'lemma': u'_',
  u'rel': u'null',
  u'tag': u'NN',
  u'word': u'shot'},
 {u'address': 3,
  u'ctag': u'AT',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'AT',
  u'word': u'an'},
 {u'address': 4,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'elephant'},
 {u'address': 5,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'in'},
 {u'address': 6,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'my'},
 {u'address': 7,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'pajamas'},
 {u'address': 8,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'Time'},
 {u'address': 9,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'flies'},
 {u'address': 10,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'like'},
 {u'address': 11,
  u'ctag': u'NN',
  u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'dep',
  u'tag': u'NN',
  u'word': u'banana'}]

Why is that using parse_sents() don't return an iterable of parse_one?

I could however, just get lazy and do:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent1 = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> sentences = [sent1, sent2]
>>> for sent in sentences:
>>> ...    print(mp.parse_one(sent).tree())

But this is not the solution I'm looking for. My question is how to answer why doesn't the parse_sent() return an iterable of parse_one(). and how could it be fixed in the NLTK code?

After @NikitaAstrakhantsev answered, I've tried it outputs a parse tree now but it seems to be confused and puts both sentences into one before parsing it.

# Initialize a MaltParser object with a pre-trained model.
mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model) 
sent = 'I shot an elephant in my pajamas'.split()
sent2 = 'Time flies like banana'.split()
# Parse a single sentence.
print(mp.parse_one(sent).tree())
print(next(next(mp.parse_sents([sent,sent2]))).tree())

[out]:

(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))

From the code it seems to be doing something weird: https://github.com/nltk/nltk/blob/develop/nltk/parse/api.py#L45

Why is it that the parser abstract class in NLTK is swooshing two sentences into one before parsing? Am I calling the parse_sents() incorrectly? If so, what is the correct way to call parse_sents()?

Nikita Astrakhantsev · Accepted Answer

As I see in your code samples, you don't call tree() in this line

>>> print(next(next(mp.parse_sents([sent,sent2]))))

while you do call tree() in all cases with parse_one().

Otherwise I don't see the reason why it could happen: parse_one() method of ParserI isn't overridden in MaltParser and everything it does is simply calling parse_sents() of MaltParser, see the code.

Upd: The line you're talking about isn't called, because parse_sents() is overridden in MaltParser and is directly called.

The only guess I have now is that java lib maltparser doesn't work correctly with input file containing several sentences (I mean this block - where java is run). Maybe original malt parser has changed the format and now it is not ' '. Unfortunately, I can't run this code by myself, because maltparser.org is down for the second day. I checked that the input file has expected format (sentences are separated by double endline), so it is very unlikely that python wrapper merges sentences.

Parsing multiple sentences with MaltParser using NLTK

Answers (1)

Related Questions