Reputation: 70
My problem is :
I have a lot of sentences of lot of documents. For every sentence I have to write a CFG using nltk python.
grammar1 = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"
""")
Instead of doing that:
or
I’m struck at this. Please help me overcome this.
Upvotes: 1
Views: 3577
Reputation: 50200
If you have one or more parsed sentences, you can extract a CFG that describes them by calling the method productions()
on the parsed sentence object (an nltk.Tree
). Here's an example with the first 10 sentences of the Penn Treebank corpus:
>>> ruleset = set(rule for tree in nltk.corpus.treebank.parsed_sents()[:10]
for rule in tree.productions())
>>> for rule in ruleset:
print(rule)
NP -> PRP
NP -> DT JJ NN
VP -> VBN S
ADVP-TMP -> RB
IN -> 'among'
NNP -> 'Corp.'
NP -> PRP$ NN NN NNS
NP-SBJ -> DT
RRC -> ADVP-TMP VP
NNP -> 'Journal'
VP -> VBN NP
...
The above will give you 278 rules (including vocabulary items) for those 10 sentences, but it gets better as your sample grows. You can take it from there.
Of course if your sentences aren't parsed yet, you'll first need to parse them.
Upvotes: 1