Sudo
Sudo

Reputation: 651

Extract clauses from sentence in python

I have to list out clauses from given sentences. I am implementing my own grammar rules to parse out clauses from sentences. The result I obtained is:

*************************************************
(S
  (CLAUSE
    (VP
      (VP they/PRP were/VBD delivered/VBN promptly/RB)
      and/CC
      (VP a/DT very/RB))
    (NP (NP good/JJ value/NN) and/CC (NP excellent/NN)))
  (CLAUSE
    (VP all/DT)
    (NP (NP around/IN (NP slipper/NN)) (NP with/IN (NP traction/NN))))
  ./.)
*************************************************

From above result, clauses should be listed out, to give the result in the following statements.

they were delivered promptly and a very good value and excellent

all around slipper with traction.

I've tried using flatten and chomsky_normal_form but couldn't get the desired result. How to list out each clauses on single line getting rid of tags?

Upvotes: 1

Views: 2117

Answers (1)

Falko
Falko

Reputation: 17867

Since all you want to extract from your string s seems to be lowercase, you can apply one of the following one-liners:

Python list comprehension

print ' '.join(''.join(c for c in s if 'a' <= c <= 'z' or c == ' ').split())

It joins (''.join) all characters that are between "a" and "z" or " ". To suppress multiple spaces next to each other it splits the result and joins it again with a space as separator.

Regular expression

If you prefer regular expressions (import re), this even shorter statement yields the same result:

print ' '.join(re.findall('[a-z]+', s))

Edit

If you want to process each clause individually, you can split the whole string s and then apply the same code to each part (except the first one, which is just the header):

for part in s.split("CLAUSE")[1:]:
    print ' '.join(re.findall('[a-z]+', part))

Upvotes: 2

Related Questions