Reputation: 436
I have the same problem that was discussed in this link Python extract sentence containing word, but the difference is that I want to find 2 words in the same sentence. I need to extract sentences from a corpus, which contains 2 specific words. Does anyone could help me, please?
Upvotes: 0
Views: 6444
Reputation: 1714
This would be simple using the TextBlob package together with Python's builtin sets.
Basically, iterate through the sentences of your text, and check if their exists an intersection between the set of words in the sentence and your search words.
from text.blob import TextBlob
search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = []
for sentence in blob.sentences:
words = set(sentence.words)
if search_words & words: # intersection
matches.append(str(sentence))
print(matches)
# ["Let's go buy some apples."]
Update: Or, more Pythonically,
from text.blob import TextBlob
search_words = set(["buy", "apples"])
blob = TextBlob("I like to eat apple. Me too. Let's go buy some apples.")
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print(matches)
# ["Let's go buy some apples."]
Upvotes: 2
Reputation: 3523
If this is what you mean:
import re
txt="I like to eat apple. Me too. Let's go buy some apples."
define_words = 'some apple'
print re.findall(r"([^.]*?%s[^.]*\.)" % define_words,txt)
Output: [" Let's go buy some apples."]
You can also try with:
define_words = raw_input("Enter string: ")
Check if the sentence contain the defined words:
import re
txt="I like to eat apple. Me too. Let's go buy some apples."
words = 'go apples'.split(' ')
sentences = re.findall(r"([^.]*\.)" ,txt)
for sentence in sentences:
if all(word in sentence for word in words):
print sentence
Upvotes: 2
Reputation: 10278
I think you want an answer using nltk. And I guess that those 2 words don't need to be consecutive right?
>>> from nltk.tokenize import sent_tokenize, word_tokenize
>>> text = 'I like to eat apple. Me too. Let's go buy some apples.'
>>> words = ['like', 'apple']
>>> sentences = sent_tokenize(text)
>>> for sentence in sentences:
... if (all(map(lambda word: word in sentence, words))):
... print sentence
...
I like to eat apple.
Upvotes: 1