Ryan
Ryan

Reputation: 399

Search keyword in document in python

I am trying to write a python script so that it can search for a keyword in a document, and retrieve the entire sentence where the keyword is. From my research i saw that acora can be used but i still found it unsuccessful.

Upvotes: 3

Views: 18103

Answers (4)

JBernardo
JBernardo

Reputation: 33387

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']

Upvotes: 4

Yajushi
Yajushi

Reputation: 1185

use grep or egrep commands with subprocess module of python, it may help you.

e.g:

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')

Upvotes: 0

nattofriends
nattofriends

Reputation: 1225

I don't have much experience with this but you might be looking for nltk.

Try this; use span_tokenize and find which span the index of your word falls under, then look that sentence up.

Upvotes: 0

ahmet alp balkan
ahmet alp balkan

Reputation: 45196

That's how you can simply do it in shell. You should write it in script yourself.

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

Here I splitted text with .split('.') and iterated, then controlled with word and and if it contains, printed it.

You should also consider that this is case-sensitive. You should consider many things on your solution, such as things ending with ! and ? are also sentences (but sometimes they aren't)

This is a sentence (ha?) or do you think (!) so?

is going to be splitted as

  • This is a sentence (ha
  • ) or do you think (
  • ) so

Upvotes: 0

Related Questions