nizam uddin
nizam uddin

Reputation: 341

How to parse more than one sentence from text file using Stanford dependency parse?

I have a text file which have many line, i wanted to parse all sentences, but it seems like i get all sentences but parse only the first sentence, not sure where m i making mistake.

import nltk
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(  model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")
txtfile =open('sample.txt',encoding="latin-1")
s=txtfile.read()
print(s)
result = dependency_parser.raw_parse(s)
for i in result:
print(list(i.triples()))

but it give only the first sentence parse tripples not other sentences, any help ?

'i like this computer'
'The great Buddha, the .....'
'My Ashford experience .... great experience.'


[[(('i', 'VBZ'), 'nsubj', ("'", 'POS')), (('i', 'VBZ'), 'nmod', ('computer', 'NN')), (('computer', 'NN'), 'case', ('like', 'IN')), (('computer', 'NN'), 'det', ('this', 'DT')), (('computer', 'NN'), 'case', ("'", 'POS'))]]

Upvotes: 0

Views: 1021

Answers (2)

alvas
alvas

Reputation: 122300

Try:

from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

with open('sample.txt') as fin:
    sents = fin.readlines()
result = dep_parser.raw_parse_sents(sents)
for parse in results:
    print list(parse.triples()) 

Do check the docstring code or demo code in repository for examples, they're usually very helpful.

Upvotes: 0

Reut Sharabani
Reut Sharabani

Reputation: 31339

You have to split the text first. You're currently parsing the literal text you posted with quotes and everything. This is evident by this part of the parsing result: ("'", 'POS')

To do that you seem to be able to use ast.literal_eval on each line. Note that an apostrophe (in a word like "don't") will ruin the formatting and you'll have to handle the apostrophes yourself with something like line = line[1:-1]:

import ast
from nltk.parse.stanford import StanfordDependencyParser
dependency_parser = StanfordDependencyParser(  model_path="edu\stanford\lp\models\lexparser\englishPCFG.ser.gz")

with open('sample.txt',encoding="latin-1") as f:
    lines = [ast.litral_eval(line) for line in f.readlines()]

for line in lines:
    parsed_lines = dependency_parser.raw_parse(line)

# now parsed_lines should contain the parsed lines from the file

Upvotes: 1

Related Questions