Reputation: 399
I am trying to write a python script so that it can search for a keyword in a document, and retrieve the entire sentence where the keyword is. From my research i saw that acora can be used but i still found it unsuccessful.
Upvotes: 3
Views: 18103
Reputation: 33387
>>> text = """Hello, this is the first sentence. This is the second.
And this may or may not be the third. Am I right? No? lol..."""
>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
return [i for i in sentences if re.search(r'\b%s\b' % word, i)]
>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']
Upvotes: 4
Reputation: 1185
use grep or egrep commands with subprocess module of python, it may help you.
e.g:
from subprocess import Popen, PIPE
stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')
Upvotes: 0
Reputation: 1225
I don't have much experience with this but you might be looking for nltk
.
Try this; use span_tokenize
and find which span the index of your word falls under, then look that sentence up.
Upvotes: 0
Reputation: 45196
That's how you can simply do it in shell. You should write it in script yourself.
>>> text = '''this is sentence 1. and that is sentence
2. and sometimes sentences are good.
when that's sentence 4, there's a good reason. and that's
sentence 5.'''
>>> for line in text.split('.'):
... if 'and' in line:
... print line
...
and that is sentence 2
and sometimes sentences are good
and that's sentence 5
Here I splitted text
with .split('.')
and iterated, then controlled with word and
and if it contains, printed it.
You should also consider that this is case-sensitive. You should consider many things on your solution, such as things ending with !
and ?
are also sentences (but sometimes they aren't)
This is a sentence (ha?) or do you think (!) so?
is going to be splitted as
Upvotes: 0