sudh
sudh

Reputation: 1105

extract a sentence using python

I would like to extract the exact sentence if a particular word is present in that sentence. Could anyone let me know how to do it with python. I used concordance() but it only prints lines where the word matches.

Upvotes: 3

Views: 6697

Answers (3)

Chris Pfohl
Chris Pfohl

Reputation: 19074

Just a quick reminder: Sentence breaking is actually a pretty complex thing, there's exceptions to the period rule, such as "Mr." or "Dr." There's also a variety of sentence ending punctuation marks. But there's also exceptions to the exception (if the next word is Capitalized and is not a proper noun, then Dr. can end a sentence, for example).

If you're interested in this (it's a natural language processing topic) you could check out:
the natural language tool kit's (nltk) punkt module.

Upvotes: 4

Devin
Devin

Reputation: 2133

dutt did a good job answering this. just wanted to add a couple things

import re

text = "go directly to jail. do not cross go. do not collect $200."
pattern = "\.(?P<sentence>.*?(go).*?)\."
match = re.search(pattern, text)
if match != None:
    sentence = match.group("sentence")

obviously, you'll need to import the regex library (import re) before you begin. here is a teardown of what the regular expression actually does (more info can be found at the Python re library page)

\. # looks for a period preceding sentence.
(?P<sentence>...) # sets the regex captured to variable "sentence".
.*? # selects all text (non-greedy) until the word "go".

again, the link to the library ref page is key.

Upvotes: -1

dutt
dutt

Reputation: 8209

If you have each sentence in a string you can use find() on your word and if found return the sentence. Otherwise you could use a regex, something like this

pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, yourwholetext)
if match != None:
    sentence = match.group("sentence")

I havent tested this but something along those lines.

My test:

import re
text = "muffins are good, cookies are bad. sauce is awesome, veggies too. fmooo mfasss, fdssaaaa."
pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, text)
if match != None:
    print match.group("sentence")

Upvotes: 1

Related Questions