bob
bob

Reputation: 41

How to flatten the parse tree and store in a string for further string operations python nltk

I am trying to get flat tree from the tree structure like the one given below.

parse tree

I want to get this whole tree in a string like without Bad tree detected error:

( (S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day) )(PP (IN of) (NP (DT the) (CD 400) (NNS money) )))(VP (VBD was) (NP-PRD (CD 8.12) (NN %) )(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %) ))))(. .) ))

Upvotes: 1

Views: 7932

Answers (4)

caspillaga
caspillaga

Reputation: 563

NLTK provides functionality to do this right away:

flat_tree = tree._pformat_flat("", "()", False)

tree.pprint() and str(tree) both would call this method internally, but adding extra logic to split it into multiple lines if needed.

Upvotes: 1

Truong-Son
Truong-Son

Reputation: 106

You can convert the tree into string using str function then split and join as follow:

parse_string = ' '.join(str(tree).split()) 

print parse_string

Upvotes: 5

bob
bob

Reputation: 41

Python nltk provide a function for tree manipulation and node extraction

from nltk.tree import Tree
for tr in trees:
    tr1 = str(tr)
    s1 = Tree.fromstring(tr1)
    s2 = s1.productions()

Upvotes: 3

RossHochwert
RossHochwert

Reputation: 170

The documentation provides a pprint() method that flattens the tree into one line.

Parsing this sentence:

string = "My name is Ross and I am cool. What's going on world? I'm looking for friends."

And then calling pprint() yields the following:

u"(NP+SBAR+S\n  (S\n    (NP (PRP$ my) (NN name))\n    (VP\n      (VBZ is)\n      (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n      (SBAR\n        (WHNP (WP What))\n        (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n    (. ?))\n  (S\n    (NP (PRP I))\n    (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n    (. .)))"

From this point, if you wish to remove the tabs and newlines, you can use the following split and join (see here):

splitted = tree.pprint().split()
flat_tree = ' '.join(splitted)

Executing that yields this for me:

u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))"

Upvotes: 2

Related Questions